Why aov() with Error() gives three strata?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Why aov() with Error() gives three strata?

Jorge Fernando Saraiva de Menezes
Dear list users,

I am trying to learn Repeated measures ANOVA using the aov() interface, but
I'm struggling to understand its output.

According to tutorials on the web, formula for a repeated measures design
is:

aov(Y ~ IV+ Error(SUBJECT/IV) )

This formula does work but it returns three strata (Error:SUBJECT, Error:
SUBJECT:IV, Error: Within), when I would expect two strata (Within and
Between subjects). I've seems some tutorials  show the exactly same setup,
but returning only the two first strata.

Is it possible to have two or three strata depending on the data?
If there is always three strata, how this would fit the interpretation of
between vs within effects?

Below a reproducible example that gives three strata:

data(beavers)
data=data.frame(id =
rep(c("beaver1","beaver2"),c(nrow(beaver1),nrow(beaver2))),rbind(beaver1,beaver2))
data$activ=factor(data$activ)
#balance dataset to have 6 samples for every combination of beaver and
activity.
balanced = split(data,interaction(data$id,data$activ))
sizes = sapply(balanced,nrow)
selected = lapply(sizes,sample.int,6)
balanced = mapply(function(x,y) {x[y,]}, balanced,selected,SIMPLIFY=F)
balanced = do.call(rbind,balanced)
aov(temp~activ+Error(id/activ),data=balanced)

Thanks,
Jorge

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why aov() with Error() gives three strata?

Bert Gunter-2
Jorge:

FYI, *generally speaking,* queries that are mostly statistical in
nature, such as yours, are off topic here -- this list is about R
programming help, not statistical help. Having said that, you still
may get a useful response here -- the r-help/statistics intersection
*is* nonempty. However, if not, 2.5 suggestions:

1. Try posting to r-sig-mixed-models instead. Repeated measures are a
type of mixed/multilevel model and you may receive some useful
suggestions there, including alternative R approaches to fitting such
model (e.g. using lme() or lmer() ).

2. Alternatively, try posting to a statistics site like stats.stackexchange.com.

2.5. Or, if you can, the best idea might be to sit down with a local
statistics expert.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Dec 28, 2017 at 7:52 AM, Jorge Fernando Saraiva de Menezes
<[hidden email]> wrote:

> Dear list users,
>
> I am trying to learn Repeated measures ANOVA using the aov() interface, but
> I'm struggling to understand its output.
>
> According to tutorials on the web, formula for a repeated measures design
> is:
>
> aov(Y ~ IV+ Error(SUBJECT/IV) )
>
> This formula does work but it returns three strata (Error:SUBJECT, Error:
> SUBJECT:IV, Error: Within), when I would expect two strata (Within and
> Between subjects). I've seems some tutorials  show the exactly same setup,
> but returning only the two first strata.
>
> Is it possible to have two or three strata depending on the data?
> If there is always three strata, how this would fit the interpretation of
> between vs within effects?
>
> Below a reproducible example that gives three strata:
>
> data(beavers)
> data=data.frame(id =
> rep(c("beaver1","beaver2"),c(nrow(beaver1),nrow(beaver2))),rbind(beaver1,beaver2))
> data$activ=factor(data$activ)
> #balance dataset to have 6 samples for every combination of beaver and
> activity.
> balanced = split(data,interaction(data$id,data$activ))
> sizes = sapply(balanced,nrow)
> selected = lapply(sizes,sample.int,6)
> balanced = mapply(function(x,y) {x[y,]}, balanced,selected,SIMPLIFY=F)
> balanced = do.call(rbind,balanced)
> aov(temp~activ+Error(id/activ),data=balanced)
>
> Thanks,
> Jorge
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why aov() with Error() gives three strata?

Jorge Fernando Saraiva de Menezes
Bert, thanks for the reply but I feel that my question is less about
statistics and more about R interface. Specifically, because the output of
R seems different than other programs (systat, for example, gives a between
and a within table instead of a three level one).

I am familiar with the connection between mixed models and repeated
measures,and how mixed models are essentially replacing the aov models due
to their greater flexibility. But I feel that despite understanding a
little of the logic behind the mixed models that aov error terms seem
completely different to me than lmer randoms.

 I will post in those support lists you pass to me, if nothing comes from
here. However I had little luck in the stats exchange when I tried there.

About a local expert, I am once more in a corner. there are many people in
my department who excel in statistics. But I none use R, drastically
reducing their ability to explain to me the output of aov.

Em 28 de dez de 2017 20:04, "Bert Gunter" <[hidden email]> escreveu:

> Jorge:
>
> FYI, *generally speaking,* queries that are mostly statistical in
> nature, such as yours, are off topic here -- this list is about R
> programming help, not statistical help. Having said that, you still
> may get a useful response here -- the r-help/statistics intersection
> *is* nonempty. However, if not, 2.5 suggestions:
>
> 1. Try posting to r-sig-mixed-models instead. Repeated measures are a
> type of mixed/multilevel model and you may receive some useful
> suggestions there, including alternative R approaches to fitting such
> model (e.g. using lme() or lmer() ).
>
> 2. Alternatively, try posting to a statistics site like
> stats.stackexchange.com.
>
> 2.5. Or, if you can, the best idea might be to sit down with a local
> statistics expert.
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Dec 28, 2017 at 7:52 AM, Jorge Fernando Saraiva de Menezes
> <[hidden email]> wrote:
> > Dear list users,
> >
> > I am trying to learn Repeated measures ANOVA using the aov() interface,
> but
> > I'm struggling to understand its output.
> >
> > According to tutorials on the web, formula for a repeated measures design
> > is:
> >
> > aov(Y ~ IV+ Error(SUBJECT/IV) )
> >
> > This formula does work but it returns three strata (Error:SUBJECT, Error:
> > SUBJECT:IV, Error: Within), when I would expect two strata (Within and
> > Between subjects). I've seems some tutorials  show the exactly same
> setup,
> > but returning only the two first strata.
> >
> > Is it possible to have two or three strata depending on the data?
> > If there is always three strata, how this would fit the interpretation of
> > between vs within effects?
> >
> > Below a reproducible example that gives three strata:
> >
> > data(beavers)
> > data=data.frame(id =
> > rep(c("beaver1","beaver2"),c(nrow(beaver1),nrow(beaver2))),
> rbind(beaver1,beaver2))
> > data$activ=factor(data$activ)
> > #balance dataset to have 6 samples for every combination of beaver and
> > activity.
> > balanced = split(data,interaction(data$id,data$activ))
> > sizes = sapply(balanced,nrow)
> > selected = lapply(sizes,sample.int,6)
> > balanced = mapply(function(x,y) {x[y,]}, balanced,selected,SIMPLIFY=F)
> > balanced = do.call(rbind,balanced)
> > aov(temp~activ+Error(id/activ),data=balanced)
> >
> > Thanks,
> > Jorge
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why aov() with Error() gives three strata?

Peter Dalgaard-2
At any rate:

Error(SUBJECT/IV)

specifies two random effects: SUBJECT and SUBJECT:IV. This is most easily understood if you conceptually arrange your data in a SUBJECT x IV table: One effect is a set of random errors added to each row, the other is a set of effects added to each cell.

If you have more than one observation within each cell, then you need a third set of errors to account for differences within cells and this is labeled "Within" variation. With one observation per cell, this stratum disappears (as far as I recall, haven't checked).

Actually, this oversimplifies a little: What actually happens is that data gets split into

1: row means
2: differences between cells within rows
3: differences between observations within cells

and if the stratum variances are decreasing, then this can be interpreted using random effects as above, with variances of each component proportional to the successive differences. (All assuming that you have a balanced data layout, otherwise aov() is just the wrong tool.)

-pd

> On 28 Dec 2017, at 19:36 , Jorge Fernando Saraiva de Menezes <[hidden email]> wrote:
>
> Bert, thanks for the reply but I feel that my question is less about
> statistics and more about R interface. Specifically, because the output of
> R seems different than other programs (systat, for example, gives a between
> and a within table instead of a three level one).
>
> I am familiar with the connection between mixed models and repeated
> measures,and how mixed models are essentially replacing the aov models due
> to their greater flexibility. But I feel that despite understanding a
> little of the logic behind the mixed models that aov error terms seem
> completely different to me than lmer randoms.
>
> I will post in those support lists you pass to me, if nothing comes from
> here. However I had little luck in the stats exchange when I tried there.
>
> About a local expert, I am once more in a corner. there are many people in
> my department who excel in statistics. But I none use R, drastically
> reducing their ability to explain to me the output of aov.
>
> Em 28 de dez de 2017 20:04, "Bert Gunter" <[hidden email]> escreveu:
>
>> Jorge:
>>
>> FYI, *generally speaking,* queries that are mostly statistical in
>> nature, such as yours, are off topic here -- this list is about R
>> programming help, not statistical help. Having said that, you still
>> may get a useful response here -- the r-help/statistics intersection
>> *is* nonempty. However, if not, 2.5 suggestions:
>>
>> 1. Try posting to r-sig-mixed-models instead. Repeated measures are a
>> type of mixed/multilevel model and you may receive some useful
>> suggestions there, including alternative R approaches to fitting such
>> model (e.g. using lme() or lmer() ).
>>
>> 2. Alternatively, try posting to a statistics site like
>> stats.stackexchange.com.
>>
>> 2.5. Or, if you can, the best idea might be to sit down with a local
>> statistics expert.
>>
>> Cheers,
>> Bert
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Thu, Dec 28, 2017 at 7:52 AM, Jorge Fernando Saraiva de Menezes
>> <[hidden email]> wrote:
>>> Dear list users,
>>>
>>> I am trying to learn Repeated measures ANOVA using the aov() interface,
>> but
>>> I'm struggling to understand its output.
>>>
>>> According to tutorials on the web, formula for a repeated measures design
>>> is:
>>>
>>> aov(Y ~ IV+ Error(SUBJECT/IV) )
>>>
>>> This formula does work but it returns three strata (Error:SUBJECT, Error:
>>> SUBJECT:IV, Error: Within), when I would expect two strata (Within and
>>> Between subjects). I've seems some tutorials  show the exactly same
>> setup,
>>> but returning only the two first strata.
>>>
>>> Is it possible to have two or three strata depending on the data?
>>> If there is always three strata, how this would fit the interpretation of
>>> between vs within effects?
>>>
>>> Below a reproducible example that gives three strata:
>>>
>>> data(beavers)
>>> data=data.frame(id =
>>> rep(c("beaver1","beaver2"),c(nrow(beaver1),nrow(beaver2))),
>> rbind(beaver1,beaver2))
>>> data$activ=factor(data$activ)
>>> #balance dataset to have 6 samples for every combination of beaver and
>>> activity.
>>> balanced = split(data,interaction(data$id,data$activ))
>>> sizes = sapply(balanced,nrow)
>>> selected = lapply(sizes,sample.int,6)
>>> balanced = mapply(function(x,y) {x[y,]}, balanced,selected,SIMPLIFY=F)
>>> balanced = do.call(rbind,balanced)
>>> aov(temp~activ+Error(id/activ),data=balanced)
>>>
>>> Thanks,
>>> Jorge
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Why aov() with Error() gives three strata?

Jorge Fernando Saraiva de Menezes
Thank you Peter, I now understand it. Indeed, when there is only one
replicate per combination of IV and SUBJECT there is only two strata. The
reason this behavior is not observed in other statistical tools is because
they usually do not allow more than one measure per combination of period
and subject.

Cheers,
Jorge






2017-12-29 9:41 GMT+02:00 peter dalgaard <[hidden email]>:

> At any rate:
>
> Error(SUBJECT/IV)
>
> specifies two random effects: SUBJECT and SUBJECT:IV. This is most easily
> understood if you conceptually arrange your data in a SUBJECT x IV table:
> One effect is a set of random errors added to each row, the other is a set
> of effects added to each cell.
>
> If you have more than one observation within each cell, then you need a
> third set of errors to account for differences within cells and this is
> labeled "Within" variation. With one observation per cell, this stratum
> disappears (as far as I recall, haven't checked).
>
> Actually, this oversimplifies a little: What actually happens is that data
> gets split into
>
> 1: row means
> 2: differences between cells within rows
> 3: differences between observations within cells
>
> and if the stratum variances are decreasing, then this can be interpreted
> using random effects as above, with variances of each component
> proportional to the successive differences. (All assuming that you have a
> balanced data layout, otherwise aov() is just the wrong tool.)
>
> -pd
>
> > On 28 Dec 2017, at 19:36 , Jorge Fernando Saraiva de Menezes <
> [hidden email]> wrote:
> >
> > Bert, thanks for the reply but I feel that my question is less about
> > statistics and more about R interface. Specifically, because the output
> of
> > R seems different than other programs (systat, for example, gives a
> between
> > and a within table instead of a three level one).
> >
> > I am familiar with the connection between mixed models and repeated
> > measures,and how mixed models are essentially replacing the aov models
> due
> > to their greater flexibility. But I feel that despite understanding a
> > little of the logic behind the mixed models that aov error terms seem
> > completely different to me than lmer randoms.
> >
> > I will post in those support lists you pass to me, if nothing comes from
> > here. However I had little luck in the stats exchange when I tried there.
> >
> > About a local expert, I am once more in a corner. there are many people
> in
> > my department who excel in statistics. But I none use R, drastically
> > reducing their ability to explain to me the output of aov.
> >
> > Em 28 de dez de 2017 20:04, "Bert Gunter" <[hidden email]>
> escreveu:
> >
> >> Jorge:
> >>
> >> FYI, *generally speaking,* queries that are mostly statistical in
> >> nature, such as yours, are off topic here -- this list is about R
> >> programming help, not statistical help. Having said that, you still
> >> may get a useful response here -- the r-help/statistics intersection
> >> *is* nonempty. However, if not, 2.5 suggestions:
> >>
> >> 1. Try posting to r-sig-mixed-models instead. Repeated measures are a
> >> type of mixed/multilevel model and you may receive some useful
> >> suggestions there, including alternative R approaches to fitting such
> >> model (e.g. using lme() or lmer() ).
> >>
> >> 2. Alternatively, try posting to a statistics site like
> >> stats.stackexchange.com.
> >>
> >> 2.5. Or, if you can, the best idea might be to sit down with a local
> >> statistics expert.
> >>
> >> Cheers,
> >> Bert
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> >> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Thu, Dec 28, 2017 at 7:52 AM, Jorge Fernando Saraiva de Menezes
> >> <[hidden email]> wrote:
> >>> Dear list users,
> >>>
> >>> I am trying to learn Repeated measures ANOVA using the aov() interface,
> >> but
> >>> I'm struggling to understand its output.
> >>>
> >>> According to tutorials on the web, formula for a repeated measures
> design
> >>> is:
> >>>
> >>> aov(Y ~ IV+ Error(SUBJECT/IV) )
> >>>
> >>> This formula does work but it returns three strata (Error:SUBJECT,
> Error:
> >>> SUBJECT:IV, Error: Within), when I would expect two strata (Within and
> >>> Between subjects). I've seems some tutorials  show the exactly same
> >> setup,
> >>> but returning only the two first strata.
> >>>
> >>> Is it possible to have two or three strata depending on the data?
> >>> If there is always three strata, how this would fit the interpretation
> of
> >>> between vs within effects?
> >>>
> >>> Below a reproducible example that gives three strata:
> >>>
> >>> data(beavers)
> >>> data=data.frame(id =
> >>> rep(c("beaver1","beaver2"),c(nrow(beaver1),nrow(beaver2))),
> >> rbind(beaver1,beaver2))
> >>> data$activ=factor(data$activ)
> >>> #balance dataset to have 6 samples for every combination of beaver and
> >>> activity.
> >>> balanced = split(data,interaction(data$id,data$activ))
> >>> sizes = sapply(balanced,nrow)
> >>> selected = lapply(sizes,sample.int,6)
> >>> balanced = mapply(function(x,y) {x[y,]}, balanced,selected,SIMPLIFY=F)
> >>> balanced = do.call(rbind,balanced)
> >>> aov(temp~activ+Error(id/activ),data=balanced)
> >>>
> >>> Thanks,
> >>> Jorge
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/
> >> posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.