lmer, p-values and all that

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

lmer, p-values and all that

Douglas Bates-2
Users are often surprised and alarmed that the summary of a linear
mixed model fit by lmer provides estimates of the fixed-effects
parameters, standard errors for these parameters and a t-ratio but no
p-values.  Similarly the output from anova applied to a single lmer
model provides the sequential sums of squares for the terms in the
fixed-effects specification and the corresponding numerator degrees of
freedom but no denominator degrees of freedom and, again, no p-values.

Because they feel that the denominator degrees of freedom and the
corresponding p-values can easily be calculated they conclude that
failure to do this is a sign of inattention or, worse, incompetence on
the part of the person who wrote lmer (i.e. me).

Perhaps I can try again to explain why I don't quote p-values or, more
to the point, why I do not take the "obviously correct" approach of
attempting to reproduce the results provided by SAS.  Let me just say
that, although there are those who feel that the purpose of the R
Project - indeed the purpose of any statistical computing whatsoever -
is to reproduce the p-values provided by SAS, I am not a member of
that group.  If those people feel that I am a heretic for even
suggesting that a p-value provided by SAS could be other than absolute
truth and that I should be made to suffer a slow, painful death by
being burned at the stake for my heresy, then I suppose that we will
be able to look forward to an exciting finale to the conference dinner
at UseR!2006 next month. (Well, I won't be looking forward to such a
finale but the rest of you can.)

As most of you know the t-statistic for a coefficient in the
fixed-effects model matrix is the square root of an F statistic with 1
numerator degree of freedom so we can, without loss of generality,
concentrate on the F statistics that were present in the anova output.
 Those who long ago took courses in "analysis of variance" or
"experimental design" that concentrated on designs for agricultural
experiments would have learned methods for estimating variance
components based on observed and expected mean squares and methods of
testing based on "error strata".  (If you weren't forced to learn
this, consider yourself lucky.)  It is therefore natural to expect
that the F statistics created from an lmer model (and also those
created by SAS PROC MIXED) are based on error strata but that is not
the case.

The parameter estimates calculated by lmer are the maximum likelihood
or the REML (residual maximum likelihood) estimates and they are not
based on observed and expected mean squares or on error strata.  And
that's a good thing because lmer can handle unbalanced designs with
multiple nested or fully crossed or partially crossed grouping factors
for the random effects.  This is important for analyzing data from
large observational studies such as occur in psychometrics.

There are many aspects of the formulation of the model and the
calculation of the parameter estimates that are very interesting to me
and have occupied my attention for several years but let's assume that
the model has been specified, the data given and the parameter
estimates obtained.  How are the F statistics calculated?  The sums of
squares and degrees of freedom for the numerators are calculated as in
a linear model.  There is a slot in an lmer model that is similar to
the "effects" component in a lm model and that, along with the
"assign" attribute for the model matrix provides the numerator of the
F ratio.  The denominator is the penalized residual sum of squares
divided by the REML degrees of freedom, which is n-p where n is the
number of observations and p is the column rank of the model matrix
for the fixed effects.

Now read that last sentence again and pay particular attention to the
word "the" in the phrase "the penalized residual sum of squares".  All
the F ratios use the same denominator.  Let me repeat that - all the F
ratios use the *same* denominator.  This is why I have a problem with
the assumption (sometimes stated as more that just an assumption -
something on the order of "absolute truth" again) that the reference
distribution for these F statistics should be an F distribution with a
known numerator degrees of freedom but a variable denominator degrees
of freedom and we can answer the question of how to calculate a
p-value by coming up with a formula to assign different denominator
degrees of freedom for each test.  The denominator doesn't change.
Why should the degrees of freedom for the denominator change?

Most of the research on tests for the fixed-effects specification in a
mixed model begin with the assumption that these statistics will have
an F distribution with a known numerator degrees of freedom and the
only purpose of the research is to decide how to obtain an approximate
denominator degrees of freedom.  I don't agree.

There is one approach that I think may be fruitful and that I am
currently pursuing.  The penalized least squares formulation of a
mixed-effects model shows that the residual sum of squares in actually
a penalized residual sum of squares and there is a quantity that
behaves like the degrees of freedom for a penalized least squares
problem.  I will insert code to calculate this and see how that
behaves in simulation.

For the time being, I would recommend using a Markov Chain Monte Carlo
sample (function mcmcsamp) to evaluate the properties of individual
coefficients (use HPDinterval or just summary from the "coda"
package).  Evaluating entire terms is more difficult but you can
always calculate the F ratio and put a lower bound on the denominator
degrees of freedom.

If anyone wants to contribute code to calculate the "obviously
correct" denominator degrees of freedom from SAS I will incorporate
it.  However, be warned that the penalized least squares approach and
sparse matrix methods used in lmer will require considerable
translation from the formulas which typically occur in papers on this
topic.  Generally those formulas involve the inverse of an n by n
matrix where n is the number of observations and you really, really
don't want to try to do that when you have a couple of million
observations.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: lmer, p-values and all that

Frank Harrell
Douglas Bates wrote:
> Users are often surprised and alarmed that the summary of a linear
. . . .
Doug,

I have been needing this kind of explanation.  That is very helpful.
Thank you.  I do a lot with penalized MLEs for ordinary regression and
logistic models and know that getting sensible P-values is not
straightforward even in that far simpler situation.

Frank
--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Frank Harrell
Department of Biostatistics, Vanderbilt University
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: lmer, p-values and all that

Marc Schwartz (via MN)
On Fri, 2006-05-19 at 17:44 -0500, Frank E Harrell Jr wrote:

> Douglas Bates wrote:
> > Users are often surprised and alarmed that the summary of a linear
> . . . .
> Doug,
>
> I have been needing this kind of explanation.  That is very helpful.
> Thank you.  I do a lot with penalized MLEs for ordinary regression and
> logistic models and know that getting sensible P-values is not
> straightforward even in that far simpler situation.
>
> Frank

I would like to echo Frank's comments and say "Thanks" to Doug for
taking the time to provide this post.

I would also like to suggest that this issue has indeed become a FAQ and
would like to recommend that an addition to the main R FAQ be made
(wording TBD) but along the lines of:

  Why are p values not displayed when using lmer()?

The response could be:

  Doug Bates has kindly provided an extensive response in a post to the
  r-help list, which can be reviewed at:

  https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html


This might save Doug, Harold and Spencer (I am probably missing some
others here) keystrokes in the future...  :-)

Best regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: lmer, p-values and all that

Doran, Harold
In reply to this post by Douglas Bates-2
here here. I had the wonderful benefit of sitting in a coffee shop in San Francisco with Doug listening to an excellent explanation. I only wish we could replicate that as an FAQ!    

Thanks for this, Doug.


-----Original Message-----
From: [hidden email] on behalf of Marc Schwartz
Sent: Fri 5/19/2006 7:54 PM
To: Frank E Harrell Jr
Cc: Douglas Bates; r-help
Subject: Re: [R] lmer, p-values and all that

On Fri, 2006-05-19 at 17:44 -0500, Frank E Harrell Jr wrote:

> Douglas Bates wrote:
> > Users are often surprised and alarmed that the summary of a linear
> . . . .
> Doug,
>
> I have been needing this kind of explanation.  That is very helpful.
> Thank you.  I do a lot with penalized MLEs for ordinary regression and
> logistic models and know that getting sensible P-values is not
> straightforward even in that far simpler situation.
>
> Frank

I would like to echo Frank's comments and say "Thanks" to Doug for
taking the time to provide this post.

I would also like to suggest that this issue has indeed become a FAQ and
would like to recommend that an addition to the main R FAQ be made
(wording TBD) but along the lines of:

  Why are p values not displayed when using lmer()?

The response could be:

  Doug Bates has kindly provided an extensive response in a post to the
  r-help list, which can be reviewed at:

  https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html


This might save Doug, Harold and Spencer (I am probably missing some
others here) keystrokes in the future...  :-)

Best regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Loading...