How to Code Random Nested Variables within Two-way Fixed Model in lmer or lme

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to Code Random Nested Variables within Two-way Fixed Model in lmer or lme

David Robichaud
Hi All,

I am frustrated by mixed-effects model! I have searched the web for
hours, and found lots on the nested anova, but nothing useful on my
specific case, which is: a random factor (C) is nested within one of the
fixed-factors (A), and a second fixed factor (B) is crossed with the
first fixed factor:

C/A

A

B

A x B

My question: I have a functioning model using the aov command (see
below), and I would now would like to recode it, using a more flexible
command such as lme or lmer. Once I have the equivalent syntax down, I
would ideally like to re-run my analysis using "family = poisson", as CO
is actually count data.

I have a dataset including a response variable CO, measured once per
Week (for 11 weeks) at 13 Locations. The 13 Locations are divided into 2
habitat types (Control and Treatment).

Thus:

CO is a continuous response variable,

Week is a fixed categorical factor,

Habitat is a fixed categorical factor, and

Location is a random categorical factor nested within Habitat.

Here is my model in R:

mCO = aov(CO ~ Week * Habitat + Error(Location/Week))

summary(mCO)

And the output:

Error: Location

           Df Sum Sq Mean Sq F value   Pr(>F)  

Habitat     1 182566   182566   8.6519 0.01341 *

Residuals 11 232115    21101                  

 

Error: Location:Week

              Df Sum Sq Mean Sq F value     Pr(>F)    

Week          10 596431    59643 11.0534    7.5e-13 ***

Week:Habitat 10 196349    19635   3.6389 0.0003251 ***

Residuals    110 593551     5396                      

Given that this is a mixed model, I believe the appropriate error terms
are as follows:

For the F test of Habitat, the denominator MS is that for location/habitat;

For the F test of Week, the denominator MS is the residual; and

For the F test of Habitat x Week, the denominator MS is the residual.

My tinkering with lmer and lme have not produced results similar to the
above

For example,

m. = lmer(CO ~ Week * Habitat + (1|Habitat/Location))

anova(m.)

produces:

Analysis of Variance Table

               Df Sum Sq Mean Sq F value

Week           10 596431    59643 11.0534

Habitat        1   28652    28652   5.3100

Week:Habitat 10 196349    19635   3.6389

Any coding advice would be greatly appreciated!

Thanks for your consideration,

Dave Robichaud


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to Code Random Nested Variables within Two-way Fixed Model in lmer or lme

Helios de Rosario
Dear Dave, there are some inconsistencies in your explanation of the
problem. You said your variables are:

> CO is a continuous response variable,
>
> Week is a fixed categorical factor,
>
> Habitat is a fixed categorical factor, and
>
> Location is a random categorical factor nested within Habitat.

What does this last statement mean? Are the Locations identified with
the same names in both Habitats (e.g. Location=1,2,3,... for
Habitat=Control, and then Location=1,2,3... for Habitat=Treatment,
although the Locations of both Habitats have nothing to do with each
other? Or do all 13 Locations receive different names?

If Location is really nested within Habitat, you would be in the former
case, and then your random terms should include the interaction
"Location:Habitat". In the latter case, the random term would just be
"Location".

But then, your model with aov is:

> mCO = aov(CO ~ Week * Habitat + Error(Location/Week))

Since you don't include Habitat in the Error term, I would say that
Location is not really nested within Habitat. But then, why is Week
nested within Location? Do you mean that the effect of Week may be
affected by the random Location?

Anyway, your interpretation of the ANOVA table is misleading:

> Error: Location
>
>            Df Sum Sq Mean Sq F value   Pr(>F)  
>
> Habitat     1 182566   182566   8.6519 0.01341 *
>
> Residuals 11 232115    21101                  
>
>  
>
> Error: Location:Week
>
>               Df Sum Sq Mean Sq F value     Pr(>F)    
>
> Week          10 596431    59643 11.0534    7.5e-13 ***
>
> Week:Habitat 10 196349    19635   3.6389 0.0003251 ***
>
> Residuals    110 593551     5396              

This actually means:

For the F test of Habitat, the denominator MS is that for Location
 
For the F test of Week, the denominator MS is that for the Location x
Week
 
For the F test of Habitat x Week, the denominator MS is that for
Location x Week


And then, you wrote your attempt with lmer:

> m. = lmer(CO ~ Week * Habitat + (1|Habitat/Location))

The random term here (1|Habitat/Location) has nothing to do with the
Error term you used in aov.

If location is really nested within Habitat, perhaps you meant

m. = lmer(CO ~ Week * Habitat + (1|Habitat:Location))

(Habitat/Location means that Habitat has a random effect per se as
well, and I guess you don't mean that!)
Or if Location is not really nested,

m. = lmer(CO ~ Week * Habitat + (1|Location))

or if you really wanted the same model as with aov:

m. = lmer(CO ~ Week * Habitat + (1|Location/Week))

Please clarify your model. Otherwise it would be impossible to make any
comparison.

Helios





>>> El día 29/09/2011 a las 6:30, Dave Robichaud <[hidden email]>
escribió:
> Hi All,
>
> I am frustrated by mixed-effects model! I have searched the web for
> hours, and found lots on the nested anova, but nothing useful on my
> specific case, which is: a random factor (C) is nested within one of
the
> fixed-factors (A), and a second fixed factor (B) is crossed with the

> first fixed factor:
>
> C/A
>
> A
>
> B
>
> A x B
>
> My question: I have a functioning model using the aov command (see
> below), and I would now would like to recode it, using a more
flexible
> command such as lme or lmer. Once I have the equivalent syntax down,
I
> would ideally like to re-run my analysis using "family = poisson", as
CO
> is actually count data.
>
> I have a dataset including a response variable CO, measured once per

> Week (for 11 weeks) at 13 Locations. The 13 Locations are divided
into 2

> habitat types (Control and Treatment).
>
> Thus:
>
> CO is a continuous response variable,
>
> Week is a fixed categorical factor,
>
> Habitat is a fixed categorical factor, and
>
> Location is a random categorical factor nested within Habitat.
>
> Here is my model in R:
>
> mCO = aov(CO ~ Week * Habitat + Error(Location/Week))
>
> summary(mCO)
>
> And the output:
>
> Error: Location
>
>            Df Sum Sq Mean Sq F value   Pr(>F)  
>
> Habitat     1 182566   182566   8.6519 0.01341 *
>
> Residuals 11 232115    21101                  
>
>  
>
> Error: Location:Week
>
>               Df Sum Sq Mean Sq F value     Pr(>F)    
>
> Week          10 596431    59643 11.0534    7.5e-13 ***
>
> Week:Habitat 10 196349    19635   3.6389 0.0003251 ***
>
> Residuals    110 593551     5396                      
>
> Given that this is a mixed model, I believe the appropriate error
terms
> are as follows:
>
> For the F test of Habitat, the denominator MS is that for
location/habitat;
>
> For the F test of Week, the denominator MS is the residual; and
>
> For the F test of Habitat x Week, the denominator MS is the
residual.
>
> My tinkering with lmer and lme have not produced results similar to
the

> above
>
> For example,
>
> m. = lmer(CO ~ Week * Habitat + (1|Habitat/Location))
>
> anova(m.)
>
> produces:
>
> Analysis of Variance Table
>
>                Df Sum Sq Mean Sq F value
>
> Week           10 596431    59643 11.0534
>
> Habitat        1   28652    28652   5.3100
>
> Week:Habitat 10 196349    19635   3.6389
>
> Any coding advice would be greatly appreciated!
>
> Thanks for your consideration,
>
> Dave Robichaud
>
>
> [[alternative HTML version deleted]]

INSTITUTO DE BIOMECÁNICA DE VALENCIA
Universidad Politécnica de Valencia • Edificio 9C
Camino de Vera s/n • 46022 VALENCIA (ESPAÑA)
Tel. +34 96 387 91 60 • Fax +34 96 387 91 69
www.ibv.org

  Antes de imprimir este e-mail piense bien si es necesario hacerlo.
En cumplimiento de la Ley Orgánica 15/1999 reguladora de la Protección
de Datos de Carácter Personal, le informamos de que el presente mensaje
contiene información confidencial, siendo para uso exclusivo del
destinatario arriba indicado. En caso de no ser usted el destinatario
del mismo le informamos que su recepción no le autoriza a su divulgación
o reproducción por cualquier medio, debiendo destruirlo de inmediato,
rogándole lo notifique al remitente.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to Code Random Nested Variables within Two-way Fixed Model in lmer or lme

David Robichaud
Hi again,

Thank you very much for taking the time to respond to my question.  I am
sorry that my explanation was confusing.  Please allow me to try to clarify.

First, please ignore my attempts to define a lmer model.  By putting
forward my best first guess, which was clearly wrong, I have only served
to confuse matters.  My goal here is to get advice on how to formulate
the correct lmer model.  Hopefully someone can help with that.

I should describe my data in more detail.  I have the following columns:

Location    Habitat    Week     CO
1           Control    1        10
2           Control    1        12
3           Control    1         0
4           Control    1         5
5           Treatment  1        10
6 Treatment 1         7
7 Treatment  1         8
8 Treatment  1         6
9 Treatment  1         0
10 Treatment  1         5
11 Treatment  1         3
12 Treatment  1         12
13 Treatment  1         0
...    (9 weeks of data omitted to save space)
1           Control    11         9
2           Control    11         8
3           Control    11         3
4           Control    11         6
5           Treatment  11         9
6 Treatment 11         6
7 Treatment  11         5
8 Treatment  11        10
9 Treatment  11         2
10 Treatment  11         4
11 Treatment  11         6
12 Treatment  11         9
13 Treatment  11         2

 From this, you will see that I have 4 control sites and 7 treatment
sites that are measured each week.  All 13 locations have different
names, and Location is a random varaible.  Is Location nested within
Habitat?  I thought it was, but maybe I am wrong.  Perhaps it is a
random variable that is not nested?

My main goal is to look for an effect of Habitat.  But if there is a
significant Week x Habitat interaction, I would examine the effect of
Habitat separately for each Week.

Hopefully, the above helps to clarify my situation.  I should re-state,
I would like to use an lmer or lme syntax to properly analyze these
data, especially given that they are counts, I would like to try family
= poisson or quasipoisson.

Thanks again,

Dave




On 29/09/2011 8:54 AM, Helios de Rosario wrote:

> Dear Dave, there are some inconsistencies in your explanation of the
> problem. You said your variables are:
>
>    
>> CO is a continuous response variable,
>>
>> Week is a fixed categorical factor,
>>
>> Habitat is a fixed categorical factor, and
>>
>> Location is a random categorical factor nested within Habitat.
>>      
> What does this last statement mean? Are the Locations identified with
> the same names in both Habitats (e.g. Location=1,2,3,... for
> Habitat=Control, and then Location=1,2,3... for Habitat=Treatment,
> although the Locations of both Habitats have nothing to do with each
> other? Or do all 13 Locations receive different names?
>
> If Location is really nested within Habitat, you would be in the former
> case, and then your random terms should include the interaction
> "Location:Habitat". In the latter case, the random term would just be
> "Location".
>
> But then, your model with aov is:
>
>    
>> mCO = aov(CO ~ Week * Habitat + Error(Location/Week))
>>      
> Since you don't include Habitat in the Error term, I would say that
> Location is not really nested within Habitat. But then, why is Week
> nested within Location? Do you mean that the effect of Week may be
> affected by the random Location?
>
> Anyway, your interpretation of the ANOVA table is misleading:
>
>    
>> Error: Location
>>
>>             Df Sum Sq Mean Sq F value   Pr(>F)
>>
>> Habitat     1 182566   182566   8.6519 0.01341 *
>>
>> Residuals 11 232115    21101
>>
>>
>>
>> Error: Location:Week
>>
>>                Df Sum Sq Mean Sq F value     Pr(>F)
>>
>> Week          10 596431    59643 11.0534    7.5e-13 ***
>>
>> Week:Habitat 10 196349    19635   3.6389 0.0003251 ***
>>
>> Residuals    110 593551     5396
>>      
> This actually means:
>
> For the F test of Habitat, the denominator MS is that for Location
>
> For the F test of Week, the denominator MS is that for the Location x
> Week
>
> For the F test of Habitat x Week, the denominator MS is that for
> Location x Week
>
>
> And then, you wrote your attempt with lmer:
>
>    
>> m. = lmer(CO ~ Week * Habitat + (1|Habitat/Location))
>>      
> The random term here (1|Habitat/Location) has nothing to do with the
> Error term you used in aov.
>
> If location is really nested within Habitat, perhaps you meant
>
> m. = lmer(CO ~ Week * Habitat + (1|Habitat:Location))
>
> (Habitat/Location means that Habitat has a random effect per se as
> well, and I guess you don't mean that!)
> Or if Location is not really nested,
>
> m. = lmer(CO ~ Week * Habitat + (1|Location))
>
> or if you really wanted the same model as with aov:
>
> m. = lmer(CO ~ Week * Habitat + (1|Location/Week))
>
> Please clarify your model. Otherwise it would be impossible to make any
> comparison.
>
> Helios
>
>
>
>
>
>    
>>>> El día 29/09/2011 a las 6:30, Dave Robichaud<[hidden email]>
>>>>          
> escribió:
>    
>> Hi All,
>>
>> I am frustrated by mixed-effects model! I have searched the web for
>> hours, and found lots on the nested anova, but nothing useful on my
>> specific case, which is: a random factor (C) is nested within one of
>>      
> the
>    
>> fixed-factors (A), and a second fixed factor (B) is crossed with the
>>      
>    
>> first fixed factor:
>>
>> C/A
>>
>> A
>>
>> B
>>
>> A x B
>>
>> My question: I have a functioning model using the aov command (see
>> below), and I would now would like to recode it, using a more
>>      
> flexible
>    
>> command such as lme or lmer. Once I have the equivalent syntax down,
>>      
> I
>    
>> would ideally like to re-run my analysis using "family = poisson", as
>>      
> CO
>    
>> is actually count data.
>>
>> I have a dataset including a response variable CO, measured once per
>>      
>    
>> Week (for 11 weeks) at 13 Locations. The 13 Locations are divided
>>      
> into 2
>    
>> habitat types (Control and Treatment).
>>
>> Thus:
>>
>> CO is a continuous response variable,
>>
>> Week is a fixed categorical factor,
>>
>> Habitat is a fixed categorical factor, and
>>
>> Location is a random categorical factor nested within Habitat.
>>
>> Here is my model in R:
>>
>> mCO = aov(CO ~ Week * Habitat + Error(Location/Week))
>>
>> summary(mCO)
>>
>> And the output:
>>
>> Error: Location
>>
>>             Df Sum Sq Mean Sq F value   Pr(>F)
>>
>> Habitat     1 182566   182566   8.6519 0.01341 *
>>
>> Residuals 11 232115    21101
>>
>>
>>
>> Error: Location:Week
>>
>>                Df Sum Sq Mean Sq F value     Pr(>F)
>>
>> Week          10 596431    59643 11.0534    7.5e-13 ***
>>
>> Week:Habitat 10 196349    19635   3.6389 0.0003251 ***
>>
>> Residuals    110 593551     5396
>>
>> Given that this is a mixed model, I believe the appropriate error
>>      
> terms
>    
>> are as follows:
>>
>> For the F test of Habitat, the denominator MS is that for
>>      
> location/habitat;
>    
>> For the F test of Week, the denominator MS is the residual; and
>>
>> For the F test of Habitat x Week, the denominator MS is the
>>      
> residual.
>    
>> My tinkering with lmer and lme have not produced results similar to
>>      
> the
>    
>> above
>>
>> For example,
>>
>> m. = lmer(CO ~ Week * Habitat + (1|Habitat/Location))
>>
>> anova(m.)
>>
>> produces:
>>
>> Analysis of Variance Table
>>
>>                 Df Sum Sq Mean Sq F value
>>
>> Week           10 596431    59643 11.0534
>>
>> Habitat        1   28652    28652   5.3100
>>
>> Week:Habitat 10 196349    19635   3.6389
>>
>> Any coding advice would be greatly appreciated!
>>
>> Thanks for your consideration,
>>
>> Dave Robichaud
>>
>>
>> [[alternative HTML version deleted]]
>>      
> INSTITUTO DE BIOMECÁNICA DE VALENCIA
> Universidad Politécnica de Valencia • Edificio 9C
> Camino de Vera s/n • 46022 VALENCIA (ESPAÑA)
> Tel. +34 96 387 91 60 • Fax +34 96 387 91 69
> www.ibv.org
>
>    Antes de imprimir este e-mail piense bien si es necesario hacerlo.
> En cumplimiento de la Ley Orgánica 15/1999 reguladora de la Protección
> de Datos de Carácter Personal, le informamos de que el presente mensaje
> contiene información confidencial, siendo para uso exclusivo del
> destinatario arriba indicado. En caso de no ser usted el destinatario
> del mismo le informamos que su recepción no le autoriza a su divulgación
> o reproducción por cualquier medio, debiendo destruirlo de inmediato,
> rogándole lo notifique al remitente.
>
>
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to Code Random Nested Variables within Two-way Fixed Model in lmer or lme

Helios de Rosario
Dave, your situation is clearer now. You wrote (see the full context at
the end of the message):

>  From this, you will see that I have 4 control sites and 7 treatment

> sites that are measured each week.  All 13 locations have different
> names, and Location is a random varaible.  Is Location nested within

> Habitat?  I thought it was, but maybe I am wrong.  Perhaps it is a
> random variable that is not nested?

I think so, or at least that nesting is irrelevant. If all factors were
fixed, to specify a nesting of Location within Habitat, you should write
the term "Habitat/Location", but that is equal to "Habitat +
Habitat:Location", and since there cannot be two different values of
Habitat for the same Location, that's exactly the same as "Habitat +
Location". So you can safely separate both terms.

Moreover, the different type of effect for Habitat and Location makes
the nesting notation unsuitable, because Habitat is a fixed effect,
whereas Habitat:Location (i.e. Location) is random. If you put
"Habitat/Location" in the "fixed" part of the formula, both terms would
be treated as fixed, and if you put it in the "random" part, both would
be treated as random, and you don't want that, I suppose.

> My main goal is to look for an effect of Habitat.  But if there is a

> significant Week x Habitat interaction, I would examine the effect of

> Habitat separately for each Week.
>
> Hopefully, the above helps to clarify my situation.  I should
re-state,
> I would like to use an lmer or lme syntax to properly analyze these
> data, especially given that they are counts, I would like to try
family
> = poisson or quasipoisson.

If you need a generalized linear model, you can try lmer (in package
lme4), since it supports the "family" argument where you can specify the
type of error distribution. On the other hand, lme (in package nlme)
only considers normal errors.

Regarding the formula, I like building formulas term by term, asking
myself if each possible term has a potential effect, and including it in
the model if I can answer "yes". From what you have said, I can infer
that you do think that there may be a Week:Habitat interaction, so that
term must be in your formula. Now:

1. Do you think that the habitat may influence your outcome (the
counts), regardless of the other factors? I guess you do, so let's
include Habitat as well.

2. Do you think that Week may influence the counts, regardless of the
other factors?

  - 2.1. If so, the fixed part of your formula would be Habitat*Week (=
Habitat + Week + Week:Habitat)
 
  - 2.2. Otherwise, it would be Habitat/Week (= Habitat +
Habitat:Week)

As already commented on, the random part would just be Location. In
theory, since each location is measured in various weeks, you might
consider that the (fixed) effect of Week could be influenced by the
random Location as well, and in that case you would have an additional
random term, the Location:Week interaction. (I.e., you could write the
random term as "Location/Week".) However, in your data set there is only
one observation for each value of Location:Week, so it would be
impossible to distinguish that random term from the residual error, and
you may just omit it.

All in all, you can try:

m. <- lmer(CO ~ Habitat*Week + (1|Location), family=poisson)

or if Week is only relevant for different types of habitat:

m. <- lmer(CO ~ Habitat/Week + (1|Location), family=poisson)

I must admit that I'm not used to analyse generalized linear models, so
I don't know if that approach is correct, but I'd say that's the code to
do what you asked for.

Now, the bad news is that perhaps you are expecting to get p-values
from anova(m.), but you won't get it. Douglas Bates explained why here:
https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html 

On the other hand, you can get p-values from Anova (from package car),
instead of anova, but I don't entirely understand the calculations of
that approach.

Helios

>>> El día 29/09/2011 a las 20:49, Dave Robichaud <[hidden email]>
escribió:
> Hi again,
>
> Thank you very much for taking the time to respond to my question.  I
am
> sorry that my explanation was confusing.  Please allow me to try to
clarify.
>
> First, please ignore my attempts to define a lmer model.  By putting

> forward my best first guess, which was clearly wrong, I have only
served
> to confuse matters.  My goal here is to get advice on how to
formulate
> the correct lmer model.  Hopefully someone can help with that.
>
> I should describe my data in more detail.  I have the following
columns:

>
> Location    Habitat    Week     CO
> 1           Control    1        10
> 2           Control    1        12
> 3           Control    1         0
> 4           Control    1         5
> 5           Treatment  1        10
> 6 Treatment 1         7
> 7 Treatment  1         8
> 8 Treatment  1         6
> 9 Treatment  1         0
> 10 Treatment  1         5
> 11 Treatment  1         3
> 12 Treatment  1         12
> 13 Treatment  1         0
> ...    (9 weeks of data omitted to save space)
> 1           Control    11         9
> 2           Control    11         8
> 3           Control    11         3
> 4           Control    11         6
> 5           Treatment  11         9
> 6 Treatment 11         6
> 7 Treatment  11         5
> 8 Treatment  11        10
> 9 Treatment  11         2
> 10 Treatment  11         4
> 11 Treatment  11         6
> 12 Treatment  11         9
> 13 Treatment  11         2
>
>  From this, you will see that I have 4 control sites and 7 treatment

> sites that are measured each week.  All 13 locations have different
> names, and Location is a random varaible.  Is Location nested within

> Habitat?  I thought it was, but maybe I am wrong.  Perhaps it is a
> random variable that is not nested?
>
> My main goal is to look for an effect of Habitat.  But if there is a

> significant Week x Habitat interaction, I would examine the effect of

> Habitat separately for each Week.
>
> Hopefully, the above helps to clarify my situation.  I should
re-state,
> I would like to use an lmer or lme syntax to properly analyze these
> data, especially given that they are counts, I would like to try
family
> = poisson or quasipoisson.
>
> Thanks again,
>
> Dave

[Copy of previous posts snipped off. See the previous part history of
this thread in:
https://stat.ethz.ch/pipermail/r-help/2011-September/291178.html ]

INSTITUTO DE BIOMECÁNICA DE VALENCIA
Universidad Politécnica de Valencia • Edificio 9C
Camino de Vera s/n • 46022 VALENCIA (ESPAÑA)
Tel. +34 96 387 91 60 • Fax +34 96 387 91 69
www.ibv.org

  Antes de imprimir este e-mail piense bien si es necesario hacerlo.
En cumplimiento de la Ley Orgánica 15/1999 reguladora de la Protección
de Datos de Carácter Personal, le informamos de que el presente mensaje
contiene información confidencial, siendo para uso exclusivo del
destinatario arriba indicado. En caso de no ser usted el destinatario
del mismo le informamos que su recepción no le autoriza a su divulgación
o reproducción por cualquier medio, debiendo destruirlo de inmediato,
rogándole lo notifique al remitente.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.