# "simulate" does not include variability in parameter estimation

4 messages
Open this post in threaded view
|

## "simulate" does not include variability in parameter estimation

 Hello, All:        The default "simulate" method for lm and glm seems to ignore the sampling variance of the parameter estimates;  see the trivial lm and glm examples below.  Both these examples estimate a mean with formula = x~1.  In both cases, the variance of the estimated mean is 1.              * In the lm example with x0 = c(-1, 1), var(x0) = 2, and var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064.  Shouldn't it be 3 = var(mean(x0)) + var(x0) = (2/2) + 2?              * In the glm example with x1=1, var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't it be 2 = var(glm estimate of the mean) + var(simulated Poisson distribution) = 1 + 1?        I'm asking, because I've recently written "simulate" methods for objects of class stats::glm and BMA::bic.glm, where my primary interest was simulating the predicted mean with "newdata".  I'm doing this, so I can get Monte Carlo prediction intervals.  My current code for "simulate.glm" and "simulate.bic.glm" are available in the development version of the "Ecfun" package on GitHub: https://github.com/sbgraves237/Ecfun       Comparing my new code with "stats:::simulate.lm" raises the following questions in my mind regarding "simulate" of a fit object:              1.  Shouldn't "simulate" start by simulating the random variability in the estimated parameters?  I need that for my current application.  If a generic "simulate" function should NOT include this, what should we call something that does include this?  And how does the current stats:::simulate.lm behavior fit with this?              2.  Shouldn't "simulate" of a model fit include an option for "newdata"?  I need that for my application.              3.  By comparing with "predict.glm", I felt I needed an argument 'type = c("link", "response")'.  "predict.glm" has an argument 'type = c("link", "response", "terms")'.  I didn't need "terms", so I didn't take the time to code that.  However, a general "simulate" function should probably include that.              4.  My application involves assumed Poisson counts.  I need to simulate those as well.  If I combined those with "simulate.glm", what would I call them?  I can't use the word "response", because that's already used with a different meaning. Might "observations" be the appropriate term?        What do you think?        Thanks,        Spencer Graves  > x0 <- c(-1, 1)  > var(x0) [1] 2  > fit0 <- lm(x0~1)  > vcov(fit0)              (Intercept) (Intercept)           1  > sim0 <- simulate(fit0, 10000, 1)  > var(unlist(sim0)) [1] 2.006408  > x1 <- 1  > fit1 <- glm(x1~1, poisson)  > coef(fit1)   (Intercept) 4.676016e-11  > exp(coef(fit1)) (Intercept)            1  > vcov(fit1)              (Intercept) (Intercept)   0.9999903  > sim1 <- simulate(fit1, 10000, 1)  > var(unlist(sim1)) [1] 1.00617  > sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina 10.15.2 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats     graphics  grDevices utils     datasets  methods base loaded via a namespace (and not attached): [1] compiler_3.6.2 tools_3.6.2 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: "simulate" does not include variability in parameter estimation

 On 26/12/2019 11:14 p.m., Spencer Graves wrote: > Hello, All: > > >         The default "simulate" method for lm and glm seems to ignore the > sampling variance of the parameter estimates;  see the trivial lm and > glm examples below.  Both these examples estimate a mean with formula = > x~1.  In both cases, the variance of the estimated mean is 1. That's how it's documented to operate.  Nothing in the help page suggests it would try to simulate parameter values.  Indeed, it doesn't have enough information on the distribution to sample from:  the appropriate distribution to simulate from if you want to include uncertainty in the parameter estimates is the posterior distribution, but lm and glm take a classical point of view, not a Bayesian point of view, so they have no concept of a posterior. >               * In the lm example with x0 = c(-1, 1), var(x0) = 2, and > var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064.  Shouldn't it be 3 > = var(mean(x0)) + var(x0) = (2/2) + 2? That calculation ignores the uncertainty in the estimation of sigma. Duncan Murdoch > > >               * In the glm example with x1=1, > var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't > it be 2 = var(glm estimate of the mean) + var(simulated Poisson > distribution) = 1 + 1? > > >         I'm asking, because I've recently written "simulate" methods for > objects of class stats::glm and BMA::bic.glm, where my primary interest > was simulating the predicted mean with "newdata".  I'm doing this, so I > can get Monte Carlo prediction intervals.  My current code for > "simulate.glm" and "simulate.bic.glm" are available in the development > version of the "Ecfun" package on GitHub: > > > https://github.com/sbgraves237/Ecfun> > >         Comparing my new code with "stats:::simulate.lm" raises the > following questions in my mind regarding "simulate" of a fit object: > > >               1.  Shouldn't "simulate" start by simulating the random > variability in the estimated parameters?  I need that for my current > application.  If a generic "simulate" function should NOT include this, > what should we call something that does include this?  And how does the > current stats:::simulate.lm behavior fit with this? > > >               2.  Shouldn't "simulate" of a model fit include an option > for "newdata"?  I need that for my application. > > >               3.  By comparing with "predict.glm", I felt I needed an > argument 'type = c("link", "response")'.  "predict.glm" has an argument > 'type = c("link", "response", "terms")'.  I didn't need "terms", so I > didn't take the time to code that.  However, a general "simulate" > function should probably include that. > > >               4.  My application involves assumed Poisson counts.  I need > to simulate those as well.  If I combined those with "simulate.glm", > what would I call them?  I can't use the word "response", because that's > already used with a different meaning. Might "observations" be the > appropriate term? > > >         What do you think? >         Thanks, >         Spencer Graves > > >   > x0 <- c(-1, 1) >   > var(x0) > [1] 2 >   > fit0 <- lm(x0~1) >   > vcov(fit0) >               (Intercept) > (Intercept)           1 >   > sim0 <- simulate(fit0, 10000, 1) >   > var(unlist(sim0)) > [1] 2.006408 >   > x1 <- 1 >   > fit1 <- glm(x1~1, poisson) >   > coef(fit1) >    (Intercept) > 4.676016e-11 >   > exp(coef(fit1)) > (Intercept) >             1 >   > vcov(fit1) >               (Intercept) > (Intercept)   0.9999903 >   > sim1 <- simulate(fit1, 10000, 1) >   > var(unlist(sim1)) > [1] 1.00617 >   > sessionInfo() > R version 3.6.2 (2019-12-12) > Platform: x86_64-apple-darwin15.6.0 (64-bit) > Running under: macOS Catalina 10.15.2 > > Matrix products: default > BLAS: > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib > LAPACK: > /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats     graphics  grDevices utils     datasets  methods base > > loaded via a namespace (and not attached): > [1] compiler_3.6.2 tools_3.6.2 > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel