lmer p-vales are sometimes too small

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

lmer p-vales are sometimes too small

Olof Leimar
This concerns whether p-values from lmer can be trusted. From
simulations, it seems that lmer can produce very small, and probably
spurious, p-values. I realize that lmer is not yet a finished product.
Is it likely that the problem will be fixed in a future release of the
lme4 package?

Using simulated data for a quite standard mixed-model anova (a balanced
two-way design; see code for the function SimMixed pasted below), I
compared the output of lmer, for three slightly different models, with
the output of aov. For an example where there is no fixed treatment
effect (null hypothesis is true), with 4 blocks, 2 treatments, and 40
observations per treatment-block combination, I find that lmer gives
more statistical significances than it should, whereas aov does not have
this problem. An example of output I generated by calling
  > SimMixed(1000)
is the following:

Proportion significances at the 0.05 level
aov:     0.05
lmer.1:  0.148
lmer.2:  0.148
lmer.3:  0.151

Proportion significances at the 0.01 level
aov:     0.006
lmer.1:  0.076
lmer.2:  0.076
lmer.3:  0.077

Proportion significances at the 0.001 level
aov:     0.001
lmer.1:  0.047
lmer.2:  0.047
lmer.3:  0.047

which is based on 1000 simulations (and takes about 5 min on my PowerMac
G5). The different models fitted are:

fm.aov <- aov(y ~ Treat + Error(Block/Treat), data = dat)
fm.lmer.1 <- lmer(y ~ Treat + (Treat|Block), data = dat)
fm.lmer.2 <- lmer(y ~ Treat + (Treat-1|Block), data = dat)
fm.lmer.3 <- lmer(y ~ Treat + (1|Block) + (Treat-1|Block), data = dat)

It seems that, depending on the level of the test, lmer gives between a
factor of 3 to a factor of around 50 times too many significances. The
first two lmer models seem to give identical results, whereas the third
(which I think perhaps is the one that best represents the data
generated by the simulation) differs slightly. In running the
simulations, warnings like this are occasionally generated:

Warning message:
optim or nlminb returned message false convergence (8)
  in: "LMEoptimize<-"(`*tmp*`, value = list(maxIter = 200, tolerance =
1.49011611938477e-08,

They seem to derive from the third of the lmer models. Perhaps there is
some numerical issue in the lmer function? From running SimMixed()
several times, I have noticed that large p-values (say, larger than 0.5)
agree very well between lmer and aov, but there seems to be a systematic
discrepancy for smaller p-values, where lmer gives smaller values than
aov. The F-values agree between all analyzes (except for fm.lmer.3 when
there is a warning), so there is a systematic difference between lmer
and aov in how a p-value is obtained from the F-value, which becomes
severe for small p-values.



My output from sessionInfo()

R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"
"datasets"  "base"

other attached packages:
      lme4   lattice    Matrix
  "0.98-1" "0.12-11"  "0.99-3"



Pasted code for the SimMixed function (some lines might wrap):

# This function generates n.sims random data sets for a design with 4
# blocks, 2 treatments applied to each block, and 40 replicate
# observations for each block-treatment combination. There is no true
# fixed treatment effect, so a statistical significance of a test for
# a fixed treatment effect ought to occur with a probability equal to
# the nominal level of the test. Four tests are applied to each
# simulated data set: the classical aov and three versions of lmer,
# corresponding to different model formulations. The proportion of
# tests for a fixed treatment effect that become significant at the
# 0.05 0.01 and 0.001 levels are printed, as well as the p-values for
# the last of the simulations. In my runs, lmer gives significance
# more often than indicated by the nominal level, for each of the
# three models, whereas aov is OK. The package lme4 needs to be loaded
# to run the code.

SimMixed <- function(n.sims = 1) {
   k <- 4                # number of blocks
   n <- 40               # num obs per block X treatment combination
   m1 <- 1.0             # fixed effect of level 1 of treatment
   m2 <- m1              # fixed effect of level 2 of treatment
   sd.block <- 0.5       # SD of block random effect
   sd.block.trt <- 1.0   # SD of random effect for block X treatm
   sd.res <- 0.1         # Residual SD
   Block <- factor( rep(1:k, each=2*n) )
   Treat <- factor( rep( rep(c("Tr1","Tr2"), k), each=n) )
   m <- rep( rep(c(m1, m2), k), each=n) # fixed effects
   # storage for p-values
   p.aov <- rep(0, n.sims)
   p.lmer.1 <- rep(0, n.sims)
   p.lmer.2 <- rep(0, n.sims)
   p.lmer.3 <- rep(0, n.sims)
   for (i in 1:n.sims) {
     # first get block and treatment random deviations
     b <- rep( rep(rnorm(k, 0, sd.block), each=2) +
              rnorm(2*k, 0, sd.block.trt), each=n )
     # then get response
     y <- m + b + rnorm(2*k*n, 0, sd.res)
     dat <- data.frame(Block, Treat, y)
     # perform the tests
     fm.aov <- aov(y ~ Treat+Error(Block/Treat), data = dat)
     fm.lmer.1 <- lmer(y ~ Treat+(Treat|Block), data = dat)
     fm.lmer.2 <- lmer(y ~ Treat+(Treat-1|Block), data = dat)
     fm.lmer.3 <- lmer(y ~ Treat+(1|Block)+(Treat-1|Block), data = dat)
     # store the p-values
     p.aov[i] <- summary(fm.aov)$"Error: Block:Treat"[[1]]$"Pr(>F)"[1]
     p.lmer.1[i] <- anova(fm.lmer.1)[6]
     p.lmer.2[i] <- anova(fm.lmer.2)[6]
     p.lmer.3[i] <- anova(fm.lmer.3)[6]
   }

   cat("\nProportion significances at the 0.05 level \n")
   cat("aov:    ", sum(p.aov<0.05)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.05)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.05)/n.sims, "\n")
   cat("lmer.3: ", sum(p.lmer.3<0.05)/n.sims, "\n")

   cat("\nProportion significances at the 0.01 level \n")
   cat("aov:    ", sum(p.aov<0.01)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.01)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.01)/n.sims, "\n")
   cat("lmer.3: ", sum(p.lmer.3<0.01)/n.sims, "\n")

   cat("\nProportion significances at the 0.001 level \n")
   cat("aov:    ", sum(p.aov<0.001)/n.sims, "\n")
   cat("lmer.1: ", sum(p.lmer.1<0.001)/n.sims, "\n")
   cat("lmer.2: ", sum(p.lmer.2<0.001)/n.sims, "\n")
   cat("lmer.3: ", sum(p.lmer.3<0.001)/n.sims, "\n")

   cat("\nFinal aov analysis: \n")
   print(summary(fm.aov)$"Error: Block:Treat")
   cat("\nFinal lmer analysis 1: \n")
   print(anova(fm.lmer.1))
   cat("\nFinal lmer analysis 2: \n")
   print(anova(fm.lmer.2))
   cat("\nFinal lmer analysis 3: \n")
   print(anova(fm.lmer.3))
}




--
Olof Leimar, Professor
Department of Zoology
Stockholm University
SE-106 91 Stockholm
Sweden

[hidden email]
http://www.zoologi.su.se/research/leimar/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: lmer p-vales are sometimes too small

Douglas Bates
I have just uploaded version 0.99-6 of the Matrix package to the
incoming area at CRAN.  It should appear on the archives in the next
day or two.

In this version all degrees of freedom, test statistics and p-values
have been removed from the summary, show and anova methods.  I agree
with John Maindonald that it is better not to give any p-values than
to give misleading, sometimes seriously misleading, p-values.

All the code for lmer is, of course, open source.  At present it
resides in the Matrix package but it may return to the lme4 package
following the release of R-2.3.0.  The sources for the Matrix package
are available at

  https://svn.r-project.org/R-packages/trunk/Matrix

An experimental version of lmer based on a supernodal sparse Cholesky
factorization is at

  https://svn.r-project.org/R-packages/branches/Matrix-mer2

That is the current development branch.  Once I get issues with the
PQL and Laplace methods for GLMMs resolved this branch will be merged
back into the trunk.  If you are only interested in linear mixed
models it is better to work with this branch.

In both branches there is an R function called getFixDF that currently
is a stub returning incorrect answers (as indicated in the comments).
At present I don't know to get correct answers for the range of models
that can be fit by lmer.  I welcome submissions of patches.


On 1/6/06, Olof Leimar <[hidden email]> wrote:

> This concerns whether p-values from lmer can be trusted. From
> simulations, it seems that lmer can produce very small, and probably
> spurious, p-values. I realize that lmer is not yet a finished product.
> Is it likely that the problem will be fixed in a future release of the
> lme4 package?
>
> Using simulated data for a quite standard mixed-model anova (a balanced
> two-way design; see code for the function SimMixed pasted below), I
> compared the output of lmer, for three slightly different models, with
> the output of aov. For an example where there is no fixed treatment
> effect (null hypothesis is true), with 4 blocks, 2 treatments, and 40
> observations per treatment-block combination, I find that lmer gives
> more statistical significances than it should, whereas aov does not have
> this problem. An example of output I generated by calling
>   > SimMixed(1000)
> is the following:
>
> Proportion significances at the 0.05 level
> aov:     0.05
> lmer.1:  0.148
> lmer.2:  0.148
> lmer.3:  0.151
>
> Proportion significances at the 0.01 level
> aov:     0.006
> lmer.1:  0.076
> lmer.2:  0.076
> lmer.3:  0.077
>
> Proportion significances at the 0.001 level
> aov:     0.001
> lmer.1:  0.047
> lmer.2:  0.047
> lmer.3:  0.047
>
> which is based on 1000 simulations (and takes about 5 min on my PowerMac
> G5). The different models fitted are:
>
> fm.aov <- aov(y ~ Treat + Error(Block/Treat), data = dat)
> fm.lmer.1 <- lmer(y ~ Treat + (Treat|Block), data = dat)
> fm.lmer.2 <- lmer(y ~ Treat + (Treat-1|Block), data = dat)
> fm.lmer.3 <- lmer(y ~ Treat + (1|Block) + (Treat-1|Block), data = dat)
>
> It seems that, depending on the level of the test, lmer gives between a
> factor of 3 to a factor of around 50 times too many significances. The
> first two lmer models seem to give identical results, whereas the third
> (which I think perhaps is the one that best represents the data
> generated by the simulation) differs slightly. In running the
> simulations, warnings like this are occasionally generated:
>
> Warning message:
> optim or nlminb returned message false convergence (8)
>   in: "LMEoptimize<-"(`*tmp*`, value = list(maxIter = 200, tolerance =
> 1.49011611938477e-08,
>
> They seem to derive from the third of the lmer models. Perhaps there is
> some numerical issue in the lmer function? From running SimMixed()
> several times, I have noticed that large p-values (say, larger than 0.5)
> agree very well between lmer and aov, but there seems to be a systematic
> discrepancy for smaller p-values, where lmer gives smaller values than
> aov. The F-values agree between all analyzes (except for fm.lmer.3 when
> there is a warning), so there is a systematic difference between lmer
> and aov in how a p-value is obtained from the F-value, which becomes
> severe for small p-values.
>
>
>
> My output from sessionInfo()
>
> R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0
>
> attached base packages:
> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"
> "datasets"  "base"
>
> other attached packages:
>       lme4   lattice    Matrix
>   "0.98-1" "0.12-11"  "0.99-3"
>
>
>
> Pasted code for the SimMixed function (some lines might wrap):
>
> # This function generates n.sims random data sets for a design with 4
> # blocks, 2 treatments applied to each block, and 40 replicate
> # observations for each block-treatment combination. There is no true
> # fixed treatment effect, so a statistical significance of a test for
> # a fixed treatment effect ought to occur with a probability equal to
> # the nominal level of the test. Four tests are applied to each
> # simulated data set: the classical aov and three versions of lmer,
> # corresponding to different model formulations. The proportion of
> # tests for a fixed treatment effect that become significant at the
> # 0.05 0.01 and 0.001 levels are printed, as well as the p-values for
> # the last of the simulations. In my runs, lmer gives significance
> # more often than indicated by the nominal level, for each of the
> # three models, whereas aov is OK. The package lme4 needs to be loaded
> # to run the code.
>
> SimMixed <- function(n.sims = 1) {
>    k <- 4                # number of blocks
>    n <- 40               # num obs per block X treatment combination
>    m1 <- 1.0             # fixed effect of level 1 of treatment
>    m2 <- m1              # fixed effect of level 2 of treatment
>    sd.block <- 0.5       # SD of block random effect
>    sd.block.trt <- 1.0   # SD of random effect for block X treatm
>    sd.res <- 0.1         # Residual SD
>    Block <- factor( rep(1:k, each=2*n) )
>    Treat <- factor( rep( rep(c("Tr1","Tr2"), k), each=n) )
>    m <- rep( rep(c(m1, m2), k), each=n) # fixed effects
>    # storage for p-values
>    p.aov <- rep(0, n.sims)
>    p.lmer.1 <- rep(0, n.sims)
>    p.lmer.2 <- rep(0, n.sims)
>    p.lmer.3 <- rep(0, n.sims)
>    for (i in 1:n.sims) {
>      # first get block and treatment random deviations
>      b <- rep( rep(rnorm(k, 0, sd.block), each=2) +
>               rnorm(2*k, 0, sd.block.trt), each=n )
>      # then get response
>      y <- m + b + rnorm(2*k*n, 0, sd.res)
>      dat <- data.frame(Block, Treat, y)
>      # perform the tests
>      fm.aov <- aov(y ~ Treat+Error(Block/Treat), data = dat)
>      fm.lmer.1 <- lmer(y ~ Treat+(Treat|Block), data = dat)
>      fm.lmer.2 <- lmer(y ~ Treat+(Treat-1|Block), data = dat)
>      fm.lmer.3 <- lmer(y ~ Treat+(1|Block)+(Treat-1|Block), data = dat)
>      # store the p-values
>      p.aov[i] <- summary(fm.aov)$"Error: Block:Treat"[[1]]$"Pr(>F)"[1]
>      p.lmer.1[i] <- anova(fm.lmer.1)[6]
>      p.lmer.2[i] <- anova(fm.lmer.2)[6]
>      p.lmer.3[i] <- anova(fm.lmer.3)[6]
>    }
>
>    cat("\nProportion significances at the 0.05 level \n")
>    cat("aov:    ", sum(p.aov<0.05)/n.sims, "\n")
>    cat("lmer.1: ", sum(p.lmer.1<0.05)/n.sims, "\n")
>    cat("lmer.2: ", sum(p.lmer.2<0.05)/n.sims, "\n")
>    cat("lmer.3: ", sum(p.lmer.3<0.05)/n.sims, "\n")
>
>    cat("\nProportion significances at the 0.01 level \n")
>    cat("aov:    ", sum(p.aov<0.01)/n.sims, "\n")
>    cat("lmer.1: ", sum(p.lmer.1<0.01)/n.sims, "\n")
>    cat("lmer.2: ", sum(p.lmer.2<0.01)/n.sims, "\n")
>    cat("lmer.3: ", sum(p.lmer.3<0.01)/n.sims, "\n")
>
>    cat("\nProportion significances at the 0.001 level \n")
>    cat("aov:    ", sum(p.aov<0.001)/n.sims, "\n")
>    cat("lmer.1: ", sum(p.lmer.1<0.001)/n.sims, "\n")
>    cat("lmer.2: ", sum(p.lmer.2<0.001)/n.sims, "\n")
>    cat("lmer.3: ", sum(p.lmer.3<0.001)/n.sims, "\n")
>
>    cat("\nFinal aov analysis: \n")
>    print(summary(fm.aov)$"Error: Block:Treat")
>    cat("\nFinal lmer analysis 1: \n")
>    print(anova(fm.lmer.1))
>    cat("\nFinal lmer analysis 2: \n")
>    print(anova(fm.lmer.2))
>    cat("\nFinal lmer analysis 3: \n")
>    print(anova(fm.lmer.3))
> }
>
>
>
>
> --
> Olof Leimar, Professor
> Department of Zoology
> Stockholm University
> SE-106 91 Stockholm
> Sweden
>
> [hidden email]
> http://www.zoologi.su.se/research/leimar/
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html