GLMM random effects

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

GLMM random effects

Natasha
Hello,
 
I have a couple questions regarding generalized linear mixed models specifically around fitting the random effects terms correctly to account for any pseudo-replication.
I am reading through and trying to follow examples from Zuur et al. Mixed Effects Models and Extensions in Ecology with R, but am still at bit unsure if I am specifying the models correctly.
 
Background information:
Our study design:   We used mark-recapture trapping to obtain density estimates and demography of prairie dogs. The location of our study sites are nested, we have 6 trapping plots within 3 colonies of prairie dogs. Within each colony 3 plots are treatment (given supplemental food) and 3 are control sites. Therefore, in total we have 18 study plots (9 control and 9 treatment).
Our study spans over two years and within each year we trapped prairie dogs twice (Spring 2008, Summer 2008, Spring 2009, Summer 2009) for a total of four trapping sessions. Therefore we also have temporal repeated measures in our sampling design. From the mark-recapture data we estimated density using Program DENSITY for each study plot and for each session (a total of 4 density estimates for each 18 plots over time). We also did vegetation sampling to obtain estimates of the natural food available in each plot, which gave us total biomass and % edible vegetation as covariates.
 
We are using the lme4 package in program R (2.11.1) for analysis:
We fitted a GLMM with Laplace restricted maximum likelihood estimation. Since our response variable density is a count with overdispersion we used a corrected poisson distribution with a log link function. And in order to account for the nested and repeated measures in our design we want to include Plot nested in Colony as random effects. The explanatory variables include Treatment, Session, Total biomass, and/or %Edible food for the fixed effects. (Colony, Plot, Treatment, and Session are all factor variables).
 
And an example of the R code that we started with is:
 
PDmodel_1 <- lmer(Density~1+Treatment+Session+Biomass+(1|Colony/Plot), family=quasipoisson, data=DensityPD)
 
My main question is:
Does this accurately take into account our nested and repeated measures design? Have we accounted for any possible pseudoreplication with this type of analysis?  Do I need to also look at type AR-1 error structures (or can I do that in a glmm?) Or is there a simpler way you might suggest?  I can also look at the change in density between time intervals instead density at time intervals if that makes anything easier to analyze?  I can also split Session into season and year if that is helpful too.
 
Any help, advice or guidance would be greatly appreciated. And if you need any more information or clarification, please ask.
 
Thanks so much for your time and any advice in advance,
natasha
Reply | Threaded
Open this post in threaded view
|

Re: GLMM random effects

bbolker
> Natasha <natasha.lloyd <at> gmail.com> writes:

  For future reference, you may have better luck *either* on
the r-sig-mixed-models help list (for advice about GLMMs) *or*
on r-sig-ecology (for advice about ecological studies) [it's not
considered polite to cross-post: try looking at the archives of
both lists to decide which is more appropriate].

> Background information:
> Our study design:   We used mark-recapture trapping to obtain density
> estimates and demography of prairie dogs. The location of our study sites
> are nested, we have 6 trapping plots within 3 colonies of prairie dogs.
> Within each colony 3 plots are treatment (given supplemental food) and 3 are
> control sites. Therefore, in total we have 18 study plots (9 control and 9
> treatment).
> Our study spans over two years and within each year we trapped prairie dogs
> twice (Spring 2008, Summer 2008, Spring 2009, Summer 2009) for a total of
> four trapping sessions. Therefore we also have temporal repeated measures in
> our sampling design. From the mark-recapture data we estimated density using
> Program DENSITY for each study plot and for each session (a total of 4
> density estimates for each 18 plots over time). We also did vegetation
> sampling to obtain estimates of the natural food available in each plot,
> which gave us total biomass and % edible vegetation as covariates.
>
> We are using the lme4 package in program R (2.11.1) for analysis:
> We fitted a GLMM with Laplace restricted maximum likelihood estimation.
> Since our response variable density is a count with overdispersion we used a
> corrected poisson distribution with a log link function.

  (I'm not sure "corrected" is a technically meaningful term:
do Zuur et al use this?)

> And in order to
> account for the nested and repeated measures in our design we want to
> include Plot nested in Colony as random effects. The explanatory variables
> include Treatment, Session, Total biomass, and/or %Edible food for the fixed
> effects. (Colony, Plot, Treatment, and Session are all factor variables).
>
> And an example of the R code that we started with is:
>
> PDmodel_1 <- lmer(Density~1+Treatment+Session+Biomass+(1|Colony/Plot),
> family=quasipoisson, data=DensityPD)
 
  (You don't need the "1" in this formula: the intercept is implicitly
included.)
 
> My main question is:
> Does this accurately take into account our nested and repeated measures
> design? Have we accounted for any possible pseudoreplication with this type
> of analysis?  Do I need to also look at type AR-1 error structures (or can I
> do that in a glmm?) Or is there a simpler way you might suggest?  I can also
> look at the change in density between time intervals instead density at time
> intervals if that makes anything easier to analyze?  I can also split
> Session into season and year if that is helpful too.

  This looks reasonable.  You're likely to have trouble estimating
the among-colony variance, because there are only 3 colonies (try
estimating a variance from 3 samples sometime) -- I wouldn't be
surprised if the estimate of the among-colony variance is zero, unless
the colonies are very different.  You might alternatively try
Colony as a fixed effect and Colony:Plot as the random effect
(or equivalently make sure your plots have 18 distinct labels rather
than being numbered 1..6 in each plot).

   It's a bit tricky to incorporate autoregressive structures in a GLMM --
if your densities are high (means > 5) you can try this in MASS::glmmPQL.
However, I suspect you're unlikely to be able to resolve AR structure
in this data set -- if you're worried, take the residuals and examine
the correlation between r(t) and r(t+1) [you'll have 54 points -- one for
each plot/session combination, not counting the final session]. If that
correlation is small/non-significant, you're off the hook.

   Quasipoisson fits are a little bit problematic in lme4 -- two alternatives
are (1) fitting a lognormal-Poisson model by adding a random effect with
a distinct level for each observation (2) fitting a negative binomial
model with glmm.admb (glmm.admb only allows a single random effect,
but you could do that if you make Colony fixed as suggested above).

  Also keep in mind the rule of thumb that you need at least 10-20 observations
per parameter you want to estimate -- with 72 observations, you're pushing
it a little (intercept (1) + treatment (1) + session (3) + biomass (1) +
random effects (2)) (it's hard to know exactly how to count random effects,
but this is a crude rule of thumb).  I certainly wouldn't try to make
your model any *more* complicated ...

  good luck

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.