Quantcast

mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Martijn Wieling
Dear useRs,

I am using mgcv version 1.7-16. When I create a model with a few
non-linear terms and a random intercept for (in my case) country using
s(Country,bs="re"), the representative line in my model (i.e.
approximate significance of smooth terms) for the random intercept
reads:
                        edf       Ref.df     F          p-value
s(Country)       36.127 58.551   0.644    0.982

Can I interpret this as there being no support for a random intercept
for country? However, when I compare the simpler model to the model
including the random intercept, the latter appears to be a significant
improvement.

> anova(gam1,gam2,test="F")
Model 1: ....
Model 2: .... + s(BirthNation, bs="re")
  Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
1    789.44     416.54
2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***

I hope somebody could help me in how I should proceed in these
situations. Do I include the random intercept or not?

I also have a related question. When I used to create a mixed-effects
regression model using lmer and included e.g., an interaction in the
fixed-effects structure, I would test if the inclusion of this
interaction was warranted using anova(lmer1,lmer2). It then would show
me that I invested 1 additional df and the resulting (possibly
significant) improvement in fit of my model.

This approach does not seem to work when using gam. In this case an
apparent investment of 1 degree of freedom for the interaction, might
result in an actual decrease of the degrees of freedom invested by the
total model (caused by a decrease of the edf's of splines in the model
with the interaction). In this case, how would I proceed in
determining if the model including the interaction term is better?

With kind regards,
Martijn Wieling

--
*******************************************
Martijn Wieling
http://www.martijnwieling.nl
[hidden email]
+31(0)614108622
*******************************************
University of Groningen
http://www.rug.nl/staff/m.b.wieling

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Simon Wood-4
Dear Martijn,

Thanks for the off line code and data: very helpful.

The answer to this is something of a 'can of worms'. Starting with the
p-value inconsistency. The problem here really is that neither test is
well justified in the case of s(...,"re") terms (and not having realised
the extent of the problem it's not flagged properly).

In the case of the p-value from `summary', the p-value is computed as if
the random effect were any other smooth. However the theory on which the
p-values for smooths rests does not hold for "re" terms (basically the
usual notion of smoothing bias is meaningless in the "re" case, and "re"
terms can not usually be well approximated by truncated eigen
approximations). The upshot is that you can get bias toward accepting
the null. I'll revert to doing something more sensible for "re" terms
for the next release, but it still won't be great, I guess.

The p-value from the comparison of models via 'anova' is equally suspect
for "re" terms. Basically, this test is justified as a rough
approximation in the case of usual smooth models, by the fact that we
can approximate the model smooths by unpenalized reduced rank eigen
approximations having degrees of freedom set to the effective degrees of
freedom of the smooths. Again, however, such reduced rank approximations
are generally not available for "re" terms, and I don't know if there is
then a decent justification for the test in this case.

'AIC' might then be seen as the answer for model selection, but Greven
and Kneib (2010, Biometrika), show that this is biased towards selecting
the larger random effects model in this case (they provide a correction,
but I'm not sure how easy it is to apply here).

You are left with a couple of sensible possibilities that are easy to
use, if it's not clear from the estimates that the term is zero. Both
involve using gam(...,method="REML") or gam(...,method="ML").

1. use gam.vcomp to get a confidence interval for the "re" variance
component. If this is bounded well away from zero, then the result is
clear.

2. Run a glrt test based on twice the difference in ML/REML score
reported for the 2 models (c.f. chisq on 1 df for your case). This
suffers from the usual problem of using a glrt test to test a variance
component for equality to zero. (AIC based on this marginal likelihood
doesn't fix the problem either --- see Greven and Kneib, again).

The second issue, that adding a fixed effect can reduce the EDF, while
improving the fit, is less of a problem, I think. If I'm happy to select
the degree of smoothness of a model by GCV, REML or whatever, then I
should also be happy to accept that the model with the fewer degrees of
freedom, but more variables, is better than the one with more degrees of
freedom and fewer variables. (The converse that I would ever reject the
better fitting, less complex model is obviously perverse).

You can get similar effects in ordinary linear modelling: adding an
important predictor gives such an improvement in fit that you can drop
polynomial dependencies on other predictors, so a model with more
degrees of freedom but fewer variables does worse than one with fewer
degrees of freedom and more variables... the issue is just a bit more
prominent when fitting GAMs because part of model selection is
integrated with fitting in this case.

best,
Simon




 > 08/05/12 15:01, Martijn Wieling wrote:

> Dear useRs,
>
> I am using mgcv version 1.7-16. When I create a model with a few
> non-linear terms and a random intercept for (in my case) country using
> s(Country,bs="re"), the representative line in my model (i.e.
> approximate significance of smooth terms) for the random intercept
> reads:
>                          edf       Ref.df     F          p-value
> s(Country)       36.127 58.551   0.644    0.982
>
> Can I interpret this as there being no support for a random intercept
> for country? However, when I compare the simpler model to the model
> including the random intercept, the latter appears to be a significant
> improvement.
>
>> anova(gam1,gam2,test="F")
> Model 1: ....
> Model 2: .... + s(BirthNation, bs="re")
>    Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
> 1    789.44     416.54
> 2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***
>
> I hope somebody could help me in how I should proceed in these
> situations. Do I include the random intercept or not?
>
> I also have a related question. When I used to create a mixed-effects
> regression model using lmer and included e.g., an interaction in the
> fixed-effects structure, I would test if the inclusion of this
> interaction was warranted using anova(lmer1,lmer2). It then would show
> me that I invested 1 additional df and the resulting (possibly
> significant) improvement in fit of my model.
>
> This approach does not seem to work when using gam. In this case an
> apparent investment of 1 degree of freedom for the interaction, might
> result in an actual decrease of the degrees of freedom invested by the
> total model (caused by a decrease of the edf's of splines in the model
> with the interaction). In this case, how would I proceed in
> determining if the model including the interaction term is better?
>
> With kind regards,
> Martijn Wieling
>
> --
> *******************************************
> Martijn Wieling
> http://www.martijnwieling.nl
> [hidden email]
> +31(0)614108622
> *******************************************
> University of Groningen
> http://www.rug.nl/staff/m.b.wieling
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Martijn Wieling
Dear Simon,

Thanks for your concise reply, this is very helpful.

With respect to my second question, however, I was not entirely clear
- or perhaps I'm misunderstanding your answer. What I meant is:
suppose I have a model with a random effect s(X, bs="re"). Now I want
to test if a certain (fixed-effect) predictor A improves the model.

I therefore compare:
m1 = gam(Y ~ s(X,bs="re"), data=dat)
m2 = gam(Y ~ A + s(X,bs="re"), data=dat)

What I didn't make explicit before is that A in the model summary of
m2 does not reach significance (e.g., p = 0.2). Comparing the models
m1 and m2, shows that m1 is the more complex model (as adding A
decreases the edf's invested in the ranef spline with more than 1),
and m1 is not significantly better than m2. Now my question is, should
I keep m2, even though A is not significant itself? Or should I ignore
the result of anova(m1,m2) anyway, given that this comparison is not
suitable when comparing models including random effects (as you argue
regarding my first question)?

If that is the case and the anova is not usable to compare m1 and m2
due to the random effect parameter, note that the same can occur
without random effects but when a non-linearity is included such as
s(Longitude,Latitude). What then is appropriate: keep m1 (which is
more complex), or use m2 (which has a less complex non-linearity, but
includes an additional non-significant fixed-effect factor).

With kind regards,
Martijn

--
*******************************************
Martijn Wieling
http://www.martijnwieling.nl
[hidden email]
+31(0)614108622
*******************************************
University of Groningen
http://www.rug.nl/staff/m.b.wieling
*******************************************


On Fri, May 11, 2012 at 5:43 PM, Simon Wood <[hidden email]> wrote:

> Dear Martijn,
>
> Thanks for the off line code and data: very helpful.
>
> The answer to this is something of a 'can of worms'. Starting with the
> p-value inconsistency. The problem here really is that neither test is well
> justified in the case of s(...,"re") terms (and not having realised the
> extent of the problem it's not flagged properly).
>
> In the case of the p-value from `summary', the p-value is computed as if the
> random effect were any other smooth. However the theory on which the
> p-values for smooths rests does not hold for "re" terms (basically the usual
> notion of smoothing bias is meaningless in the "re" case, and "re" terms can
> not usually be well approximated by truncated eigen approximations). The
> upshot is that you can get bias toward accepting the null. I'll revert to
> doing something more sensible for "re" terms for the next release, but it
> still won't be great, I guess.
>
> The p-value from the comparison of models via 'anova' is equally suspect for
> "re" terms. Basically, this test is justified as a rough approximation in
> the case of usual smooth models, by the fact that we can approximate the
> model smooths by unpenalized reduced rank eigen approximations having
> degrees of freedom set to the effective degrees of freedom of the smooths.
> Again, however, such reduced rank approximations are generally not available
> for "re" terms, and I don't know if there is then a decent justification for
> the test in this case.
>
> 'AIC' might then be seen as the answer for model selection, but Greven and
> Kneib (2010, Biometrika), show that this is biased towards selecting the
> larger random effects model in this case (they provide a correction, but I'm
> not sure how easy it is to apply here).
>
> You are left with a couple of sensible possibilities that are easy to use,
> if it's not clear from the estimates that the term is zero. Both involve
> using gam(...,method="REML") or gam(...,method="ML").
>
> 1. use gam.vcomp to get a confidence interval for the "re" variance
> component. If this is bounded well away from zero, then the result is clear.
>
> 2. Run a glrt test based on twice the difference in ML/REML score reported
> for the 2 models (c.f. chisq on 1 df for your case). This suffers from the
> usual problem of using a glrt test to test a variance component for equality
> to zero. (AIC based on this marginal likelihood doesn't fix the problem
> either --- see Greven and Kneib, again).
>
> The second issue, that adding a fixed effect can reduce the EDF, while
> improving the fit, is less of a problem, I think. If I'm happy to select the
> degree of smoothness of a model by GCV, REML or whatever, then I should also
> be happy to accept that the model with the fewer degrees of freedom, but
> more variables, is better than the one with more degrees of freedom and
> fewer variables. (The converse that I would ever reject the better fitting,
> less complex model is obviously perverse).
>
> You can get similar effects in ordinary linear modelling: adding an
> important predictor gives such an improvement in fit that you can drop
> polynomial dependencies on other predictors, so a model with more degrees of
> freedom but fewer variables does worse than one with fewer degrees of
> freedom and more variables... the issue is just a bit more prominent when
> fitting GAMs because part of model selection is integrated with fitting in
> this case.
>
> best,
> Simon
>
>
>
>
>
>> 08/05/12 15:01, Martijn Wieling wrote:
>>
>> Dear useRs,
>>
>> I am using mgcv version 1.7-16. When I create a model with a few
>> non-linear terms and a random intercept for (in my case) country using
>> s(Country,bs="re"), the representative line in my model (i.e.
>> approximate significance of smooth terms) for the random intercept
>> reads:
>>                         edf       Ref.df     F          p-value
>> s(Country)       36.127 58.551   0.644    0.982
>>
>> Can I interpret this as there being no support for a random intercept
>> for country? However, when I compare the simpler model to the model
>> including the random intercept, the latter appears to be a significant
>> improvement.
>>
>>> anova(gam1,gam2,test="F")
>>
>> Model 1: ....
>> Model 2: .... + s(BirthNation, bs="re")
>>   Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
>> 1    789.44     416.54
>> 2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***
>>
>> I hope somebody could help me in how I should proceed in these
>> situations. Do I include the random intercept or not?
>>
>> I also have a related question. When I used to create a mixed-effects
>> regression model using lmer and included e.g., an interaction in the
>> fixed-effects structure, I would test if the inclusion of this
>> interaction was warranted using anova(lmer1,lmer2). It then would show
>> me that I invested 1 additional df and the resulting (possibly
>> significant) improvement in fit of my model.
>>
>> This approach does not seem to work when using gam. In this case an
>> apparent investment of 1 degree of freedom for the interaction, might
>> result in an actual decrease of the degrees of freedom invested by the
>> total model (caused by a decrease of the edf's of splines in the model
>> with the interaction). In this case, how would I proceed in
>> determining if the model including the interaction term is better?
>>
>> With kind regards,
>> Martijn Wieling
>>
>> --
>> *******************************************
>> Martijn Wieling
>> http://www.martijnwieling.nl
>> [hidden email]
>> +31(0)614108622
>> *******************************************
>> University of Groningen
>> http://www.rug.nl/staff/m.b.wieling
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
> +44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Simon Wood-4
Martijn,

I don't think there is one right answer to this. If you look at things
in the way that one would usually view a smooth model then m2 is both
simpler (lower EDF) and fits better, so is simply a better model (if the
simpler model fits better then why would you not use it?).

But of course `simpler' depends on whether you view the random effect as
counting for one parameter, or for it's `effective degrees of freedom'.
  If it's the former then you should probably fit models using
method="ML" and compare via a GLRT test using the ML score, or simply
drop the fixed effect if its p-value according to anova(m2) is too high.

I would not use anova(m1,m2) in this case, because of the difficulty in
interpreting the random effects as being equivalent to un-penalized
effects with rank equal to the random effect edfs.

best,
Simon

On 11/05/12 17:50, Martijn Wieling wrote:

> Dear Simon,
>
> Thanks for your concise reply, this is very helpful.
>
> With respect to my second question, however, I was not entirely clear
> - or perhaps I'm misunderstanding your answer. What I meant is:
> suppose I have a model with a random effect s(X, bs="re"). Now I want
> to test if a certain (fixed-effect) predictor A improves the model.
>
> I therefore compare:
> m1 = gam(Y ~ s(X,bs="re"), data=dat)
> m2 = gam(Y ~ A + s(X,bs="re"), data=dat)
>
> What I didn't make explicit before is that A in the model summary of
> m2 does not reach significance (e.g., p = 0.2). Comparing the models
> m1 and m2, shows that m1 is the more complex model (as adding A
> decreases the edf's invested in the ranef spline with more than 1),
> and m1 is not significantly better than m2. Now my question is, should
> I keep m2, even though A is not significant itself? Or should I ignore
> the result of anova(m1,m2) anyway, given that this comparison is not
> suitable when comparing models including random effects (as you argue
> regarding my first question)?
>
> If that is the case and the anova is not usable to compare m1 and m2
> due to the random effect parameter, note that the same can occur
> without random effects but when a non-linearity is included such as
> s(Longitude,Latitude). What then is appropriate: keep m1 (which is
> more complex), or use m2 (which has a less complex non-linearity, but
> includes an additional non-significant fixed-effect factor).
>
> With kind regards,
> Martijn
>
> --
> *******************************************
> Martijn Wieling
> http://www.martijnwieling.nl
> [hidden email]
> +31(0)614108622
> *******************************************
> University of Groningen
> http://www.rug.nl/staff/m.b.wieling
> *******************************************

--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Simon Wood-4
In reply to this post by Martijn Wieling
Having looked at this further, I've made some changes in mgcv_1.7-17 to
the p-value computations for terms that can be penalized to zero during
fitting (e.g. s(x,bs="re"), s(x,m=1) etc).

The Wald statistic based p-values from summary.gam and anova.gam (i.e.
what you get from e.g. anova(a) where a is a fitted gam object) are
quite well founded for smooth terms that are non-zero under full
penalization (e.g. a cubic spline is a straight line under full
penalization). For such smooths, an extension of Nychka's (1988) result
on CI's for splines gives a well founded distributional result on which
to base a Wald statistic. However, the Nychka result requires the
smoothing bias to be substantially less than the smoothing estimator
variance, and this will often not be the case if smoothing can actually
penalize a term to zero (to understand why, see argument in appendix of
Marra & Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).

Simulation testing shows that this theoretical concern has serious
practical consequences. So for terms that can be penalized to zero,
alternative approximations have to be used, and these are now
implemented in mgcv_1.7-17 (see ?summary.gam).

The approximate test performed by anova(a,b) (a and b are fitted "gam"
objects) is less well founded. It is a reasonable approximation when
each smooth term in the models could in principle be well approximated
by an unpenalized term of rank approximately equal to the edf of the
smooth term, but otherwise the p-values produced are likely to be much
too small. In particular simulation testing suggests that the test is
not to be trusted with s(...,bs="re") terms, and can be poor if the
models being compared involve any terms that can be penalized to zero
during fitting. (Although the mechanisms are a little different, this is
similar to the problem we would have if the models were viewed as
regular mixed models and we tried to use a GLRT to test variance
components for equality to zero).

These issues are now documented in ?anova.gam and ?summary.gam...

Simon

On 08/05/12 15:01, Martijn Wieling wrote:

> Dear useRs,
>
> I am using mgcv version 1.7-16. When I create a model with a few
> non-linear terms and a random intercept for (in my case) country using
> s(Country,bs="re"), the representative line in my model (i.e.
> approximate significance of smooth terms) for the random intercept
> reads:
>                          edf       Ref.df     F          p-value
> s(Country)       36.127 58.551   0.644    0.982
>
> Can I interpret this as there being no support for a random intercept
> for country? However, when I compare the simpler model to the model
> including the random intercept, the latter appears to be a significant
> improvement.
>
>> anova(gam1,gam2,test="F")
> Model 1: ....
> Model 2: .... + s(BirthNation, bs="re")
>    Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
> 1    789.44     416.54
> 2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***
>
> I hope somebody could help me in how I should proceed in these
> situations. Do I include the random intercept or not?
>
> I also have a related question. When I used to create a mixed-effects
> regression model using lmer and included e.g., an interaction in the
> fixed-effects structure, I would test if the inclusion of this
> interaction was warranted using anova(lmer1,lmer2). It then would show
> me that I invested 1 additional df and the resulting (possibly
> significant) improvement in fit of my model.
>
> This approach does not seem to work when using gam. In this case an
> apparent investment of 1 degree of freedom for the interaction, might
> result in an actual decrease of the degrees of freedom invested by the
> total model (caused by a decrease of the edf's of splines in the model
> with the interaction). In this case, how would I proceed in
> determining if the model including the interaction term is better?
>
> With kind regards,
> Martijn Wieling
>
> --
> *******************************************
> Martijn Wieling
> http://www.martijnwieling.nl
> [hidden email]
> +31(0)614108622
> *******************************************
> University of Groningen
> http://www.rug.nl/staff/m.b.wieling
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Martijn Wieling
Dear Simon,

I ran an additional analysis using bam (mgcv 1.7-17) with three random
intercepts and no non-linearities, and compared these to the results
of lmer (lme4). Using bam results in a significant random intercept
(even though it has a very low edf-value), while the lmer results show
no variance associated to the random intercept of Placename. Should I
drop the random intercept of Placename and if so, how is this apparent
from the results of bam?

Summaries of both models are shown below.

With kind regards,
Martijn

#### l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) +
(1|Placename), data=wrddst); print(l1,cor=F)

Linear mixed model fit by REML
Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1
|      Placename)
   Data: wrddst
    AIC    BIC logLik deviance REMLdev
 -44985 -44927  22498   -45009  -44997
Random effects:
 Groups    Name        Variance  Std.Dev.
 Word      (Intercept) 0.0800944 0.283009
 Key       (Intercept) 0.0013641 0.036933
 Placename (Intercept) 0.0000000 0.000000
 Residual              0.0381774 0.195390
Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40

Fixed effects:
            Estimate Std. Error t value
(Intercept) -0.00342    0.01513   -0.23
geogamfit    0.99249    0.02612   37.99


#### m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs="re") +
s(Key,bs="re") + s(Placename,bs="re"), data=wrddst,method="REML");
summary(m1,freq=F)

Family: gaussian
Link function: identity

Formula:
RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = "re") + s(Key,
    bs = "re") + s(Placename, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.00342    0.01513  -0.226    0.821
geogamfit    0.99249    0.02612  37.991   <2e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

Approximate significance of smooth terms:
                   edf Ref.df       F p-value
s(Word)      3.554e+02    347 634.716  <2e-16 ***
s(Key)       2.946e+02    316  23.054  <2e-16 ***
s(Placename) 1.489e-04     38   7.282  <2e-16 ***
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1

R-sq.(adj) =  0.693   Deviance explained = 69.4%
REML score = -22498  Scale est. = 0.038177  n = 112608


On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R]
<[hidden email]> wrote:

> Having looked at this further, I've made some changes in mgcv_1.7-17 to
> the p-value computations for terms that can be penalized to zero during
> fitting (e.g. s(x,bs="re"), s(x,m=1) etc).
>
> The Wald statistic based p-values from summary.gam and anova.gam (i.e.
> what you get from e.g. anova(a) where a is a fitted gam object) are
> quite well founded for smooth terms that are non-zero under full
> penalization (e.g. a cubic spline is a straight line under full
> penalization). For such smooths, an extension of Nychka's (1988) result
> on CI's for splines gives a well founded distributional result on which
> to base a Wald statistic. However, the Nychka result requires the
> smoothing bias to be substantially less than the smoothing estimator
> variance, and this will often not be the case if smoothing can actually
> penalize a term to zero (to understand why, see argument in appendix of
> Marra & Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).
>
> Simulation testing shows that this theoretical concern has serious
> practical consequences. So for terms that can be penalized to zero,
> alternative approximations have to be used, and these are now
> implemented in mgcv_1.7-17 (see ?summary.gam).
>
> The approximate test performed by anova(a,b) (a and b are fitted "gam"
> objects) is less well founded. It is a reasonable approximation when
> each smooth term in the models could in principle be well approximated
> by an unpenalized term of rank approximately equal to the edf of the
> smooth term, but otherwise the p-values produced are likely to be much
> too small. In particular simulation testing suggests that the test is
> not to be trusted with s(...,bs="re") terms, and can be poor if the
> models being compared involve any terms that can be penalized to zero
> during fitting. (Although the mechanisms are a little different, this is
> similar to the problem we would have if the models were viewed as
> regular mixed models and we tried to use a GLRT to test variance
> components for equality to zero).
>
> These issues are now documented in ?anova.gam and ?summary.gam...
>
> Simon
>
> On 08/05/12 15:01, Martijn Wieling wrote:
>
>> Dear useRs,
>>
>> I am using mgcv version 1.7-16. When I create a model with a few
>> non-linear terms and a random intercept for (in my case) country using
>> s(Country,bs="re"), the representative line in my model (i.e.
>> approximate significance of smooth terms) for the random intercept
>> reads:
>>                          edf       Ref.df     F          p-value
>> s(Country)       36.127 58.551   0.644    0.982
>>
>> Can I interpret this as there being no support for a random intercept
>> for country? However, when I compare the simpler model to the model
>> including the random intercept, the latter appears to be a significant
>> improvement.
>>
>>> anova(gam1,gam2,test="F")
>> Model 1: ....
>> Model 2: .... + s(BirthNation, bs="re")
>>    Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
>> 1    789.44     416.54
>> 2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***
>>
>> I hope somebody could help me in how I should proceed in these
>> situations. Do I include the random intercept or not?
>>
>> I also have a related question. When I used to create a mixed-effects
>> regression model using lmer and included e.g., an interaction in the
>> fixed-effects structure, I would test if the inclusion of this
>> interaction was warranted using anova(lmer1,lmer2). It then would show
>> me that I invested 1 additional df and the resulting (possibly
>> significant) improvement in fit of my model.
>>
>> This approach does not seem to work when using gam. In this case an
>> apparent investment of 1 degree of freedom for the interaction, might
>> result in an actual decrease of the degrees of freedom invested by the
>> total model (caused by a decrease of the edf's of splines in the model
>> with the interaction). In this case, how would I proceed in
>> determining if the model including the interaction term is better?
>>
>> With kind regards,
>> Martijn Wieling
>>
>> --
>> *******************************************
>> Martijn Wieling
>> http://www.martijnwieling.nl
>> [hidden email]
>> +31(0)614108622
>> *******************************************
>> University of Groningen
>> http://www.rug.nl/staff/m.b.wieling
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
> +44 (0)1225 386603               http://people.bath.ac.uk/sw283
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/mgcv-inclusion-of-random-intercept-in-model-based-on-p-value-of-smooth-or-anova-tp4617585p4631060.html
> To unsubscribe from mgcv: inclusion of random intercept in model - based on
> p-value of smooth or anova?, click here.
> NAML

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Simon Wood-4
Hi Martijn,

Irrespective of the p-value, 'bam' and 'lmer' agree that the variance
component for 'Placename' is practically zero. In the 'bam' output see
the 'edf' for s(Placename), or for a more direct comparison call
gam.vcomp(m1).

As mentioned in ?summary.gam the p-values for "re" terms are fairly
crude (and certainly don't solve all the usual problems with testing
variance components for equality to zero), so I would not take them too
seriously for testing whether your random effect is exactly zero, when
the estimates/predictions are this close to zero (the typical random
effect size for Placename is about 1e-8, after all). That said, when I
randomly re-shuffle Placename, so that the null hypothesis is true, then
the p-value distribution does look uniform, as it should, despite some
edfs even smaller than that for the original data.... So the low p-value
may simply reflect the common problem that even very tiny effects are
often statistically significant at high sample sizes, I suppose. Anyway,
unless effects of size 1e-8 are meaningful here, I would drop the
Placename term.

best,
Simon

ps. I'm not sure what effect the rather heavy tails on the residuals may
be having here?


On 11/06/12 14:56, Martijn Wieling wrote:

> Dear Simon,
>
> I ran an additional analysis using bam (mgcv 1.7-17) with three random
> intercepts and no non-linearities, and compared these to the results
> of lmer (lme4). Using bam results in a significant random intercept
> (even though it has a very low edf-value), while the lmer results show
> no variance associated to the random intercept of Placename. Should I
> drop the random intercept of Placename and if so, how is this apparent
> from the results of bam?
>
> Summaries of both models are shown below.
>
> With kind regards,
> Martijn
>
> #### l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) +
> (1|Placename), data=wrddst); print(l1,cor=F)
>
> Linear mixed model fit by REML
> Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1
> |      Placename)
>     Data: wrddst
>      AIC    BIC logLik deviance REMLdev
>   -44985 -44927  22498   -45009  -44997
> Random effects:
>   Groups    Name        Variance  Std.Dev.
>   Word      (Intercept) 0.0800944 0.283009
>   Key       (Intercept) 0.0013641 0.036933
>   Placename (Intercept) 0.0000000 0.000000
>   Residual              0.0381774 0.195390
> Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40
>
> Fixed effects:
>              Estimate Std. Error t value
> (Intercept) -0.00342    0.01513   -0.23
> geogamfit    0.99249    0.02612   37.99
>
>
> #### m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs="re") +
> s(Key,bs="re") + s(Placename,bs="re"), data=wrddst,method="REML");
> summary(m1,freq=F)
>
> Family: gaussian
> Link function: identity
>
> Formula:
> RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = "re") + s(Key,
>      bs = "re") + s(Placename, bs = "re")
>
> Parametric coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.00342    0.01513  -0.226    0.821
> geogamfit    0.99249    0.02612  37.991<2e-16 ***
> ---
> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>
> Approximate significance of smooth terms:
>                     edf Ref.df       F p-value
> s(Word)      3.554e+02    347 634.716<2e-16 ***
> s(Key)       2.946e+02    316  23.054<2e-16 ***
> s(Placename) 1.489e-04     38   7.282<2e-16 ***
> ---
> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>
> R-sq.(adj) =  0.693   Deviance explained = 69.4%
> REML score = -22498  Scale est. = 0.038177  n = 112608
>
>
> On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R]
> <[hidden email]>  wrote:
>> Having looked at this further, I've made some changes in mgcv_1.7-17 to
>> the p-value computations for terms that can be penalized to zero during
>> fitting (e.g. s(x,bs="re"), s(x,m=1) etc).
>>
>> The Wald statistic based p-values from summary.gam and anova.gam (i.e.
>> what you get from e.g. anova(a) where a is a fitted gam object) are
>> quite well founded for smooth terms that are non-zero under full
>> penalization (e.g. a cubic spline is a straight line under full
>> penalization). For such smooths, an extension of Nychka's (1988) result
>> on CI's for splines gives a well founded distributional result on which
>> to base a Wald statistic. However, the Nychka result requires the
>> smoothing bias to be substantially less than the smoothing estimator
>> variance, and this will often not be the case if smoothing can actually
>> penalize a term to zero (to understand why, see argument in appendix of
>> Marra&  Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).
>>
>> Simulation testing shows that this theoretical concern has serious
>> practical consequences. So for terms that can be penalized to zero,
>> alternative approximations have to be used, and these are now
>> implemented in mgcv_1.7-17 (see ?summary.gam).
>>
>> The approximate test performed by anova(a,b) (a and b are fitted "gam"
>> objects) is less well founded. It is a reasonable approximation when
>> each smooth term in the models could in principle be well approximated
>> by an unpenalized term of rank approximately equal to the edf of the
>> smooth term, but otherwise the p-values produced are likely to be much
>> too small. In particular simulation testing suggests that the test is
>> not to be trusted with s(...,bs="re") terms, and can be poor if the
>> models being compared involve any terms that can be penalized to zero
>> during fitting. (Although the mechanisms are a little different, this is
>> similar to the problem we would have if the models were viewed as
>> regular mixed models and we tried to use a GLRT to test variance
>> components for equality to zero).
>>
>> These issues are now documented in ?anova.gam and ?summary.gam...
>>
>> Simon
>>
>> On 08/05/12 15:01, Martijn Wieling wrote:
>>
>>> Dear useRs,
>>>
>>> I am using mgcv version 1.7-16. When I create a model with a few
>>> non-linear terms and a random intercept for (in my case) country using
>>> s(Country,bs="re"), the representative line in my model (i.e.
>>> approximate significance of smooth terms) for the random intercept
>>> reads:
>>>                           edf       Ref.df     F          p-value
>>> s(Country)       36.127 58.551   0.644    0.982
>>>
>>> Can I interpret this as there being no support for a random intercept
>>> for country? However, when I compare the simpler model to the model
>>> including the random intercept, the latter appears to be a significant
>>> improvement.
>>>
>>>> anova(gam1,gam2,test="F")
>>> Model 1: ....
>>> Model 2: .... + s(BirthNation, bs="re")
>>>     Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
>>> 1    789.44     416.54
>>> 2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***
>>>
>>> I hope somebody could help me in how I should proceed in these
>>> situations. Do I include the random intercept or not?
>>>
>>> I also have a related question. When I used to create a mixed-effects
>>> regression model using lmer and included e.g., an interaction in the
>>> fixed-effects structure, I would test if the inclusion of this
>>> interaction was warranted using anova(lmer1,lmer2). It then would show
>>> me that I invested 1 additional df and the resulting (possibly
>>> significant) improvement in fit of my model.
>>>
>>> This approach does not seem to work when using gam. In this case an
>>> apparent investment of 1 degree of freedom for the interaction, might
>>> result in an actual decrease of the degrees of freedom invested by the
>>> total model (caused by a decrease of the edf's of splines in the model
>>> with the interaction). In this case, how would I proceed in
>>> determining if the model including the interaction term is better?
>>>
>>> With kind regards,
>>> Martijn Wieling
>>>
>>> --
>>> *******************************************
>>> Martijn Wieling
>>> http://www.martijnwieling.nl
>>> [hidden email]
>>> +31(0)614108622
>>> *******************************************
>>> University of Groningen
>>> http://www.rug.nl/staff/m.b.wieling
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>> +44 (0)1225 386603               http://people.bath.ac.uk/sw283
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/mgcv-inclusion-of-random-intercept-in-model-based-on-p-value-of-smooth-or-anova-tp4617585p4631060.html
>> To unsubscribe from mgcv: inclusion of random intercept in model - based on
>> p-value of smooth or anova?, click here.
>> NAML
>


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Simon Wood-4
In reply to this post by Martijn Wieling
Martin,

I had a nagging feeling that there must be more to this than my previous
reply suggested, and have been digging a bit further. Basically I would
expect these p-values to not be great if you had nested random effects
(such as main effects + their interaction), but your case looked rather
straightforward...

After some digging it turns out that the p-value for Placename is wrong
in this case, as you suspected. R routine `qr' has a `tol' argument that
by default is set to 1e-7. In the computation of the test statistic for
"re" terms I had called qr without changing this default to the value 0
that is actually appropriate (I should have noticed this, I've been
caught out by it before). With the correct 'tol', Placement is
non-significant. This will be fixed in the next release, but that won't
happen until I've tested the random effects p-value approximations a bit
more thoroughly.

best,
Simon



On 11/06/12 14:56, Martijn Wieling wrote:

> Dear Simon,
>
> I ran an additional analysis using bam (mgcv 1.7-17) with three random
> intercepts and no non-linearities, and compared these to the results
> of lmer (lme4). Using bam results in a significant random intercept
> (even though it has a very low edf-value), while the lmer results show
> no variance associated to the random intercept of Placename. Should I
> drop the random intercept of Placename and if so, how is this apparent
> from the results of bam?
>
> Summaries of both models are shown below.
>
> With kind regards,
> Martijn
>
> #### l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) +
> (1|Placename), data=wrddst); print(l1,cor=F)
>
> Linear mixed model fit by REML
> Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1
> |      Placename)
>     Data: wrddst
>      AIC    BIC logLik deviance REMLdev
>   -44985 -44927  22498   -45009  -44997
> Random effects:
>   Groups    Name        Variance  Std.Dev.
>   Word      (Intercept) 0.0800944 0.283009
>   Key       (Intercept) 0.0013641 0.036933
>   Placename (Intercept) 0.0000000 0.000000
>   Residual              0.0381774 0.195390
> Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40
>
> Fixed effects:
>              Estimate Std. Error t value
> (Intercept) -0.00342    0.01513   -0.23
> geogamfit    0.99249    0.02612   37.99
>
>
> #### m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs="re") +
> s(Key,bs="re") + s(Placename,bs="re"), data=wrddst,method="REML");
> summary(m1,freq=F)
>
> Family: gaussian
> Link function: identity
>
> Formula:
> RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = "re") + s(Key,
>      bs = "re") + s(Placename, bs = "re")
>
> Parametric coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.00342    0.01513  -0.226    0.821
> geogamfit    0.99249    0.02612  37.991<2e-16 ***
> ---
> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>
> Approximate significance of smooth terms:
>                     edf Ref.df       F p-value
> s(Word)      3.554e+02    347 634.716<2e-16 ***
> s(Key)       2.946e+02    316  23.054<2e-16 ***
> s(Placename) 1.489e-04     38   7.282<2e-16 ***
> ---
> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>
> R-sq.(adj) =  0.693   Deviance explained = 69.4%
> REML score = -22498  Scale est. = 0.038177  n = 112608
>
>
> On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R]
> <[hidden email]>  wrote:
>> Having looked at this further, I've made some changes in mgcv_1.7-17 to
>> the p-value computations for terms that can be penalized to zero during
>> fitting (e.g. s(x,bs="re"), s(x,m=1) etc).
>>
>> The Wald statistic based p-values from summary.gam and anova.gam (i.e.
>> what you get from e.g. anova(a) where a is a fitted gam object) are
>> quite well founded for smooth terms that are non-zero under full
>> penalization (e.g. a cubic spline is a straight line under full
>> penalization). For such smooths, an extension of Nychka's (1988) result
>> on CI's for splines gives a well founded distributional result on which
>> to base a Wald statistic. However, the Nychka result requires the
>> smoothing bias to be substantially less than the smoothing estimator
>> variance, and this will often not be the case if smoothing can actually
>> penalize a term to zero (to understand why, see argument in appendix of
>> Marra&  Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).
>>
>> Simulation testing shows that this theoretical concern has serious
>> practical consequences. So for terms that can be penalized to zero,
>> alternative approximations have to be used, and these are now
>> implemented in mgcv_1.7-17 (see ?summary.gam).
>>
>> The approximate test performed by anova(a,b) (a and b are fitted "gam"
>> objects) is less well founded. It is a reasonable approximation when
>> each smooth term in the models could in principle be well approximated
>> by an unpenalized term of rank approximately equal to the edf of the
>> smooth term, but otherwise the p-values produced are likely to be much
>> too small. In particular simulation testing suggests that the test is
>> not to be trusted with s(...,bs="re") terms, and can be poor if the
>> models being compared involve any terms that can be penalized to zero
>> during fitting. (Although the mechanisms are a little different, this is
>> similar to the problem we would have if the models were viewed as
>> regular mixed models and we tried to use a GLRT to test variance
>> components for equality to zero).
>>
>> These issues are now documented in ?anova.gam and ?summary.gam...
>>
>> Simon
>>
>> On 08/05/12 15:01, Martijn Wieling wrote:
>>
>>> Dear useRs,
>>>
>>> I am using mgcv version 1.7-16. When I create a model with a few
>>> non-linear terms and a random intercept for (in my case) country using
>>> s(Country,bs="re"), the representative line in my model (i.e.
>>> approximate significance of smooth terms) for the random intercept
>>> reads:
>>>                           edf       Ref.df     F          p-value
>>> s(Country)       36.127 58.551   0.644    0.982
>>>
>>> Can I interpret this as there being no support for a random intercept
>>> for country? However, when I compare the simpler model to the model
>>> including the random intercept, the latter appears to be a significant
>>> improvement.
>>>
>>>> anova(gam1,gam2,test="F")
>>> Model 1: ....
>>> Model 2: .... + s(BirthNation, bs="re")
>>>     Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
>>> 1    789.44     416.54
>>> 2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***
>>>
>>> I hope somebody could help me in how I should proceed in these
>>> situations. Do I include the random intercept or not?
>>>
>>> I also have a related question. When I used to create a mixed-effects
>>> regression model using lmer and included e.g., an interaction in the
>>> fixed-effects structure, I would test if the inclusion of this
>>> interaction was warranted using anova(lmer1,lmer2). It then would show
>>> me that I invested 1 additional df and the resulting (possibly
>>> significant) improvement in fit of my model.
>>>
>>> This approach does not seem to work when using gam. In this case an
>>> apparent investment of 1 degree of freedom for the interaction, might
>>> result in an actual decrease of the degrees of freedom invested by the
>>> total model (caused by a decrease of the edf's of splines in the model
>>> with the interaction). In this case, how would I proceed in
>>> determining if the model including the interaction term is better?
>>>
>>> With kind regards,
>>> Martijn Wieling
>>>
>>> --
>>> *******************************************
>>> Martijn Wieling
>>> http://www.martijnwieling.nl
>>> [hidden email]
>>> +31(0)614108622
>>> *******************************************
>>> University of Groningen
>>> http://www.rug.nl/staff/m.b.wieling
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>> +44 (0)1225 386603               http://people.bath.ac.uk/sw283
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/mgcv-inclusion-of-random-intercept-in-model-based-on-p-value-of-smooth-or-anova-tp4617585p4631060.html
>> To unsubscribe from mgcv: inclusion of random intercept in model - based on
>> p-value of smooth or anova?, click here.
>> NAML
>


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?

Simon Wood-4
Martin,

I've just submitted mgcv_1.7-19 to CRAN, which includes a major upgrade
of the p-value computation for random effect terms (and any other smooth
term which can be penalized to zero as part of estimation). The new
p-values are still conditional on the smoothing parameter/variance
component estimates, but given those estimates are based on the null
distribution of the restricted likelihood ratio statistic. Basically, if
you are going to condition on the smoothing parameters, then this is the
right thing to do, but if smoothing parameter estimates are very
uncertain, then the p-values may be underestimates (in simulations the
underestimation seems to be rather modest).

In the Gaussian mixed additive model case, the alternative is to use the
RLRsim package, while for GLMMs the only way to further improve things
is by writing code to simulate for the null distribution of the test
statistic...

Thanks for pushing be into action on this!

best,
Simon


On 25/06/12 16:31, Simon Wood wrote:

> Martin,
>
> I had a nagging feeling that there must be more to this than my previous
> reply suggested, and have been digging a bit further. Basically I would
> expect these p-values to not be great if you had nested random effects
> (such as main effects + their interaction), but your case looked rather
> straightforward...
>
> After some digging it turns out that the p-value for Placename is wrong
> in this case, as you suspected. R routine `qr' has a `tol' argument that
> by default is set to 1e-7. In the computation of the test statistic for
> "re" terms I had called qr without changing this default to the value 0
> that is actually appropriate (I should have noticed this, I've been
> caught out by it before). With the correct 'tol', Placement is
> non-significant. This will be fixed in the next release, but that won't
> happen until I've tested the random effects p-value approximations a bit
> more thoroughly.
>
> best,
> Simon
>
>
>
> On 11/06/12 14:56, Martijn Wieling wrote:
>> Dear Simon,
>>
>> I ran an additional analysis using bam (mgcv 1.7-17) with three random
>> intercepts and no non-linearities, and compared these to the results
>> of lmer (lme4). Using bam results in a significant random intercept
>> (even though it has a very low edf-value), while the lmer results show
>> no variance associated to the random intercept of Placename. Should I
>> drop the random intercept of Placename and if so, how is this apparent
>> from the results of bam?
>>
>> Summaries of both models are shown below.
>>
>> With kind regards,
>> Martijn
>>
>> #### l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) +
>> (1|Placename), data=wrddst); print(l1,cor=F)
>>
>> Linear mixed model fit by REML
>> Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1
>> |      Placename)
>>     Data: wrddst
>>      AIC    BIC logLik deviance REMLdev
>>   -44985 -44927  22498   -45009  -44997
>> Random effects:
>>   Groups    Name        Variance  Std.Dev.
>>   Word      (Intercept) 0.0800944 0.283009
>>   Key       (Intercept) 0.0013641 0.036933
>>   Placename (Intercept) 0.0000000 0.000000
>>   Residual              0.0381774 0.195390
>> Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40
>>
>> Fixed effects:
>>              Estimate Std. Error t value
>> (Intercept) -0.00342    0.01513   -0.23
>> geogamfit    0.99249    0.02612   37.99
>>
>>
>> #### m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs="re") +
>> s(Key,bs="re") + s(Placename,bs="re"), data=wrddst,method="REML");
>> summary(m1,freq=F)
>>
>> Family: gaussian
>> Link function: identity
>>
>> Formula:
>> RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = "re") + s(Key,
>>      bs = "re") + s(Placename, bs = "re")
>>
>> Parametric coefficients:
>>              Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.00342    0.01513  -0.226    0.821
>> geogamfit    0.99249    0.02612  37.991<2e-16 ***
>> ---
>> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>
>> Approximate significance of smooth terms:
>>                     edf Ref.df       F p-value
>> s(Word)      3.554e+02    347 634.716<2e-16 ***
>> s(Key)       2.946e+02    316  23.054<2e-16 ***
>> s(Placename) 1.489e-04     38   7.282<2e-16 ***
>> ---
>> Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>>
>> R-sq.(adj) =  0.693   Deviance explained = 69.4%
>> REML score = -22498  Scale est. = 0.038177  n = 112608
>>
>>
>> On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R]
>> <[hidden email]>  wrote:
>>> Having looked at this further, I've made some changes in mgcv_1.7-17 to
>>> the p-value computations for terms that can be penalized to zero during
>>> fitting (e.g. s(x,bs="re"), s(x,m=1) etc).
>>>
>>> The Wald statistic based p-values from summary.gam and anova.gam (i.e.
>>> what you get from e.g. anova(a) where a is a fitted gam object) are
>>> quite well founded for smooth terms that are non-zero under full
>>> penalization (e.g. a cubic spline is a straight line under full
>>> penalization). For such smooths, an extension of Nychka's (1988) result
>>> on CI's for splines gives a well founded distributional result on which
>>> to base a Wald statistic. However, the Nychka result requires the
>>> smoothing bias to be substantially less than the smoothing estimator
>>> variance, and this will often not be the case if smoothing can actually
>>> penalize a term to zero (to understand why, see argument in appendix of
>>> Marra&  Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).
>>>
>>> Simulation testing shows that this theoretical concern has serious
>>> practical consequences. So for terms that can be penalized to zero,
>>> alternative approximations have to be used, and these are now
>>> implemented in mgcv_1.7-17 (see ?summary.gam).
>>>
>>> The approximate test performed by anova(a,b) (a and b are fitted "gam"
>>> objects) is less well founded. It is a reasonable approximation when
>>> each smooth term in the models could in principle be well approximated
>>> by an unpenalized term of rank approximately equal to the edf of the
>>> smooth term, but otherwise the p-values produced are likely to be much
>>> too small. In particular simulation testing suggests that the test is
>>> not to be trusted with s(...,bs="re") terms, and can be poor if the
>>> models being compared involve any terms that can be penalized to zero
>>> during fitting. (Although the mechanisms are a little different, this is
>>> similar to the problem we would have if the models were viewed as
>>> regular mixed models and we tried to use a GLRT to test variance
>>> components for equality to zero).
>>>
>>> These issues are now documented in ?anova.gam and ?summary.gam...
>>>
>>> Simon
>>>
>>> On 08/05/12 15:01, Martijn Wieling wrote:
>>>
>>>> Dear useRs,
>>>>
>>>> I am using mgcv version 1.7-16. When I create a model with a few
>>>> non-linear terms and a random intercept for (in my case) country using
>>>> s(Country,bs="re"), the representative line in my model (i.e.
>>>> approximate significance of smooth terms) for the random intercept
>>>> reads:
>>>>                           edf       Ref.df     F          p-value
>>>> s(Country)       36.127 58.551   0.644    0.982
>>>>
>>>> Can I interpret this as there being no support for a random intercept
>>>> for country? However, when I compare the simpler model to the model
>>>> including the random intercept, the latter appears to be a significant
>>>> improvement.
>>>>
>>>>> anova(gam1,gam2,test="F")
>>>> Model 1: ....
>>>> Model 2: .... + s(BirthNation, bs="re")
>>>>     Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)
>>>> 1    789.44     416.54
>>>> 2    753.15     373.54 36.292   43.003 2.3891 1.225e-05 ***
>>>>
>>>> I hope somebody could help me in how I should proceed in these
>>>> situations. Do I include the random intercept or not?
>>>>
>>>> I also have a related question. When I used to create a mixed-effects
>>>> regression model using lmer and included e.g., an interaction in the
>>>> fixed-effects structure, I would test if the inclusion of this
>>>> interaction was warranted using anova(lmer1,lmer2). It then would show
>>>> me that I invested 1 additional df and the resulting (possibly
>>>> significant) improvement in fit of my model.
>>>>
>>>> This approach does not seem to work when using gam. In this case an
>>>> apparent investment of 1 degree of freedom for the interaction, might
>>>> result in an actual decrease of the degrees of freedom invested by the
>>>> total model (caused by a decrease of the edf's of splines in the model
>>>> with the interaction). In this case, how would I proceed in
>>>> determining if the model including the interaction term is better?
>>>>
>>>> With kind regards,
>>>> Martijn Wieling
>>>>
>>>> --
>>>> *******************************************
>>>> Martijn Wieling
>>>> http://www.martijnwieling.nl
>>>> [hidden email]
>>>> +31(0)614108622
>>>> *******************************************
>>>> University of Groningen
>>>> http://www.rug.nl/staff/m.b.wieling
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> --
>>> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>>> +44 (0)1225 386603               http://people.bath.ac.uk/sw283
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ________________________________
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> http://r.789695.n4.nabble.com/mgcv-inclusion-of-random-intercept-in-model-based-on-p-value-of-smooth-or-anova-tp4617585p4631060.html
>>>
>>> To unsubscribe from mgcv: inclusion of random intercept in model -
>>> based on
>>> p-value of smooth or anova?, click here.
>>> NAML
>>
>
>


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...