indexing within panels in xyplot

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

indexing within panels in xyplot

Sebastian P. Luque
Dear R-helpers,

I need to show a linear fit through a subset of the data within each
combination of levels of two factors.  So I prepared an xyplot with
different panels for each level of one of the factors, and different
symbols within each panel for the levels of the second factor.  My problem
is selecting the subset of each combination through which the line should
be fit for subsequent plotting.  This hopefully shows the idea:


---<---------------cut here---------------start-------------->---
toydf <- expand.grid(1:100, c("A", "B"),
                     c("pop1", "pop2", "pop3", "pop4", "pop5"))
toydf <- data.frame(facA = toydf[[3]], facB = toydf[[2]],
                    x = toydf[[1]], y = rnorm(1000))

xyplot(y ~ x | facA, groups = facB, data = toydf,
       panel.groups = function(x, y, subscripts, ...) {
         panel.xyplot(x, y, ...)
         lindx <- which(y[subscripts] == max(y[subscripts], na.rm = TRUE))
         xleft <- mean(x[lindx], na.rm = TRUE)
         fit <- lm(y[x >= xleft] ~ x[x >= xleft])
         panel.abline(fit)
       })
---<---------------cut here---------------end---------------->---

i.e. the left limit for fitting the line is defined by the mean of x
values where y is equal to the maximum y values, *within* each combination
of levels of both factors.  The above is giving me:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
        0 (non-NA) cases
In addition: Warning message:
no finite arguments to max; returning -Inf

which shows I'm not understanding how the 'subscripts' argument works.
I'd appreciate some pointers on what I'm doing wrong, as I haven't been
able to find help in the help pages and List archives.

Thanks,

--
Sebastian P. Luque

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: indexing within panels in xyplot

Frede Aakmann Tøgersen

Based on your two first sentences I think the solution is to use

xyplot(y ~ x | facA, groups = facB, data = toydf,type=c("p","r"))

Try it and see if this is what you want.


Best regards

Frede Aakmann Tøgersen
Scientist


Danish Institute of Agricultural Sciences
Research Centre Foulum
Dept. of Genetics and Biotechnology
Blichers Allé 20, P.O. BOX 50
DK-8830 Tjele

Phone:   +45 8999 1900
Direct:  +45 8999 1878

E-mail:  [hidden email]
Web:   http://www.agrsci.org                               

This email may contain information that is confidential.
Any use or publication of this email without written permission from DIAS is not allowed.
If you are not the intended recipient, please notify DIAS immediately and delete this email.



 

> -----Oprindelig meddelelse-----
> Fra: [hidden email]
> [mailto:[hidden email]] På vegne af Sebastian Luque
> Sendt: 21. februar 2006 08:20
> Til: [hidden email]
> Emne: [R] indexing within panels in xyplot
>
> Dear R-helpers,
>
> I need to show a linear fit through a subset of the data
> within each combination of levels of two factors.  So I
> prepared an xyplot with different panels for each level of
> one of the factors, and different symbols within each panel
> for the levels of the second factor.  My problem is selecting
> the subset of each combination through which the line should
> be fit for subsequent plotting.  This hopefully shows the idea:
>
>
> ---<---------------cut here---------------start-------------->---
> toydf <- expand.grid(1:100, c("A", "B"),
>                      c("pop1", "pop2", "pop3", "pop4",
> "pop5")) toydf <- data.frame(facA = toydf[[3]], facB = toydf[[2]],
>                     x = toydf[[1]], y = rnorm(1000))
>
> xyplot(y ~ x | facA, groups = facB, data = toydf,
>        panel.groups = function(x, y, subscripts, ...) {
>          panel.xyplot(x, y, ...)
>          lindx <- which(y[subscripts] == max(y[subscripts],
> na.rm = TRUE))
>          xleft <- mean(x[lindx], na.rm = TRUE)
>          fit <- lm(y[x >= xleft] ~ x[x >= xleft])
>          panel.abline(fit)
>        })
> ---<---------------cut here---------------end---------------->---
>
> i.e. the left limit for fitting the line is defined by the
> mean of x values where y is equal to the maximum y values,
> *within* each combination of levels of both factors.  The
> above is giving me:
>
> Error in lm.fit(x, y, offset = offset, singular.ok =
> singular.ok, ...) :
> 0 (non-NA) cases
> In addition: Warning message:
> no finite arguments to max; returning -Inf
>
> which shows I'm not understanding how the 'subscripts' argument works.
> I'd appreciate some pointers on what I'm doing wrong, as I
> haven't been able to find help in the help pages and List archives.
>
> Thanks,
>
> --
> Sebastian P. Luque
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: indexing within panels in xyplot

Deepayan Sarkar
In reply to this post by Sebastian P. Luque
On 2/21/06, Sebastian Luque <[hidden email]> wrote:

> Dear R-helpers,
>
> I need to show a linear fit through a subset of the data within each
> combination of levels of two factors.  So I prepared an xyplot with
> different panels for each level of one of the factors, and different
> symbols within each panel for the levels of the second factor.  My problem
> is selecting the subset of each combination through which the line should
> be fit for subsequent plotting.  This hopefully shows the idea:
>
>
> ---<---------------cut here---------------start-------------->---
> toydf <- expand.grid(1:100, c("A", "B"),
>                      c("pop1", "pop2", "pop3", "pop4", "pop5"))
> toydf <- data.frame(facA = toydf[[3]], facB = toydf[[2]],
>                     x = toydf[[1]], y = rnorm(1000))
>
> xyplot(y ~ x | facA, groups = facB, data = toydf,
>        panel.groups = function(x, y, subscripts, ...) {
>          panel.xyplot(x, y, ...)
>          lindx <- which(y[subscripts] == max(y[subscripts], na.rm = TRUE))
>          xleft <- mean(x[lindx], na.rm = TRUE)
>          fit <- lm(y[x >= xleft] ~ x[x >= xleft])
>          panel.abline(fit)
>        })
> ---<---------------cut here---------------end---------------->---
>
> i.e. the left limit for fitting the line is defined by the mean of x
> values where y is equal to the maximum y values, *within* each combination
> of levels of both factors.  The above is giving me:
>
> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
> 0 (non-NA) cases
> In addition: Warning message:
> no finite arguments to max; returning -Inf
>
> which shows I'm not understanding how the 'subscripts' argument works.
> I'd appreciate some pointers on what I'm doing wrong, as I haven't been
> able to find help in the help pages and List archives.

Well, there are exceptions to this rule, but generally x and y, when
they are passed on to the panel function, are _already_ subsetted, so
x[subscripts] makes absolutely no sense. Note how your panel function
calls

panel.xyplot(x, y, ...)

without referring to subscripts at all. The subscripts argument is
there for other variables (e.g. if you were drawing confidence
intervals, and had a separate vector in your data specifying the
interval lengths). In your case, there are no other variables
involved, so just get rid of the subscripts.

Deepayan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: indexing within panels in xyplot

Sebastian P. Luque
"Deepayan Sarkar" <[hidden email]> wrote:

[...]

> Well, there are exceptions to this rule, but generally x and y, when
> they are passed on to the panel function, are _already_ subsetted, so
> x[subscripts] makes absolutely no sense. Note how your panel function
> calls

> panel.xyplot(x, y, ...)

> without referring to subscripts at all. The subscripts argument is
> there for other variables (e.g. if you were drawing confidence
> intervals, and had a separate vector in your data specifying the
> interval lengths). In your case, there are no other variables
> involved, so just get rid of the subscripts.

Thanks Deepayan, I was indeed quite confused about this.

I realized I needed to limit the fitted line to the range of x values the
line is fit to, so I changed to panel.curve:

---<---------------cut here---------------start-------------->---
xyplot(y ~ x | facA, groups = facB, data = toydf,
       panel.groups = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         lindx <- which(y == max(y, na.rm = TRUE))
         xleft <- mean(x[lindx], na.rm = TRUE)
         fit <- lm(y[x >= xleft] ~ x[x >= xleft])
         panel.curve(coef(fit)[1] + (coef(fit)[2] * x),
                     xleft, max(x, na.rm = TRUE))
       })

---<---------------cut here---------------end---------------->---

but can't find a way to color the line for each group differently.  I
tried passing a length-2 vector as a 'col' argument to panel.curve.
Unfortunately it's only picking the first, so that both lines get colored
the same.  I'm not sure, but it seems as if I need to use
'panel.superpose' directly to do this, as the help page suggests the above
would work?

Cheers,

--
Sebastian P. Luque

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: indexing within panels in xyplot

Deepayan Sarkar
On 2/21/06, Sebastian Luque <[hidden email]> wrote:

> "Deepayan Sarkar" <[hidden email]> wrote:
>
> [...]
>
> > Well, there are exceptions to this rule, but generally x and y, when
> > they are passed on to the panel function, are _already_ subsetted, so
> > x[subscripts] makes absolutely no sense. Note how your panel function
> > calls
>
> > panel.xyplot(x, y, ...)
>
> > without referring to subscripts at all. The subscripts argument is
> > there for other variables (e.g. if you were drawing confidence
> > intervals, and had a separate vector in your data specifying the
> > interval lengths). In your case, there are no other variables
> > involved, so just get rid of the subscripts.
>
> Thanks Deepayan, I was indeed quite confused about this.
>
> I realized I needed to limit the fitted line to the range of x values the
> line is fit to, so I changed to panel.curve:
>
> ---<---------------cut here---------------start-------------->---
> xyplot(y ~ x | facA, groups = facB, data = toydf,
>        panel.groups = function(x, y, ...) {
>          panel.xyplot(x, y, ...)
>          lindx <- which(y == max(y, na.rm = TRUE))
>          xleft <- mean(x[lindx], na.rm = TRUE)
>          fit <- lm(y[x >= xleft] ~ x[x >= xleft])
>          panel.curve(coef(fit)[1] + (coef(fit)[2] * x),
>                      xleft, max(x, na.rm = TRUE))
>        })
>
> ---<---------------cut here---------------end---------------->---
>
> but can't find a way to color the line for each group differently.  I
> tried passing a length-2 vector as a 'col' argument to panel.curve.
> Unfortunately it's only picking the first, so that both lines get colored
> the same.  I'm not sure, but it seems as if I need to use
> 'panel.superpose' directly to do this, as the help page suggests the above
> would work?

The (somewhat mysterious) solution is the following:


xyplot(y ~ x | facA, groups = facB, data = toydf,
       panel = panel.superpose,
       panel.groups = function(x, y, col.line, ...) {
           panel.xyplot(x, y, ...)
           lindx <- which(y == max(y, na.rm = TRUE))
           xleft <- mean(x[lindx], na.rm = TRUE)
           fit <- lm(y[x >= xleft] ~ x[x >= xleft])
           panel.curve(coef(fit)[1] + (coef(fit)[2] * x),
                       col = col.line,
                       xleft, max(x, na.rm = TRUE))
       })

This uses the fact that panel.groups is always supplied a 'col.line'
argument (along with many others) which has been suitably calculated
for each group (see panel.superpose for how this works).

You are in fact using 'panel.superpose' directly, as that's what the
panel function defaults to when there is a 'groups' argument. However,
this will change in R-2.3.0, and to use a panel.groups argument, you
will need to explicitly specify panel=panel.superpose. Sorry for the
confusion, but I believe this to be more sensible in the bigger scheme
of things.

If you now want a different set of line colors, you can

(1) either modify the "superpose.line" parameter, or
(2) specify col.line = c('red', 'blue') etc in the xyplot call.

Hope that makes things a bit clearer.

Deepayan
--
http://www.stat.wisc.edu/~deepayan/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html