[R] bootstrap bca confidence intervals for large number of statistics in one model; library("boot")

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[R] bootstrap bca confidence intervals for large number of statistics in one model; library("boot")

Jacob Wegelin-3

Sometimes one might like to obtain pointwise bootstrap bias-corrected,
accelerated (BCA) confidence intervals for a large number of statistics
computed from a single dataset.  For instance, one might like to get
(so as to plot graphically) bootstrap confidence bands for the fitted
values in a regression model.

(Example: Chiu S et al., Early Acceleration of Head Circumference in
Children with Fragile X Syndrome and Autism. Journal of Developmental &
Behavioral Pediatrics 2007;In press. In this paper we plot the
estimated trajectories of head circumference for two different
groups, computed by linear mixed-effects models, with confidence bands
obtained by bootstrap.)

Below is a toy example of how to do this using the "boot" library.
We obtain BCA CIs for all three regression parameters and for the fitted
values at 50 levels of the predictor.

set.seed(1234567)
x<-runif(150)
y<-2/3 + pi * x^2 + runif(length(x))/2
plot(x,y)
DAT<-data.frame(x,y)
NEWDATA<-data.frame(x=seq(min(x), max(x), length=50))
library('boot')
myfn<-function(data, whichrows) {
        TheseData<-data[whichrows,]
        thisLM<-lm( y~poly(x,2), data=TheseData)
        thisFit<-predict(thisLM, newdata=NEWDATA)
        c(
                coef(summary(thisLM))[,"Estimate"]
                , thisFit)
}
bootObj<-boot( data=DAT, statistic=myfn, R=1000 )
names(bootObj)
dim(bootObj$t)
sofar<-t(sapply( 1:ncol(bootObj$t), function(thiscolumn) boot.ci(bootObj, type='bca', index=thiscolumn)$bca[4:5] ))
colnames(sofar)<-c("lo", "hi")
NEWDATA<-cbind(NEWDATA, sofar[4:nrow(sofar),])
thecoefs<-sofar[1:3,]
lines( NEWDATA$x, NEWDATA$lo, col='red')
lines( NEWDATA$x, NEWDATA$hi, col='red')

Note: To get boot.ci to run with type='bca' it seems necessary to have a
large value of R.  Otherwise boot.ci returns an error, in my (limited)
experience.

Thanks in advance for any critiques.  (For instance, is there an easier or more elegant way?)

Jacob Wegelin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

[R] bootstrap syntax + bootstrap for teaching

Tim Hesterberg
Previous subject:
bootstrap bca confidence intervals for large number of statistics in one model; library("boot")

Jacob Wegelin asked for an easier way to do many bootstrap confidence
intervals for regression output.
The syntax would be easier with S+Resample, example below.
You create an ordinary lm object, then do e.g.
        boot <- bootstrap(fit, c(coef(fit), predict(fit, newdata=NEWDATA)))
        limits.bca(boot)


---- RESAMPLING FOR TEACHING ----

I would encourage people to consider using S+Resample for teaching.
There is a free student version of S+, and S+Resample is easier for
students -- easier syntax (in general, not just this example), plus a
menu interface.  There are free chapters and packages you can use to
supplement a statistics course with resampling.  See:
http://www.insightful.com/Hesterberg/bootstrap/#Reference.introStat

I'll give a workshop on resampling for teaching at the USCOTS conference
(U.S. Conference On Teaching Statistics) on May 16.
http://www.causeweb.org/uscots
http://www.causeweb.org/workshop/hesterberg/


set.seed(0)
x <- runif(150)
y <- 2/3 + pi * x^2 + runif(length(x))/2
plot(x,y)
DAT <- data.frame(x,y)
NEWDATA <- data.frame(x=seq(min(x), max(x), length=50))
library(resample)
fit <- lm(y ~ x + x^2, data=DAT)
boot <- bootstrap(fit, c(coef(fit), predict(fit, newdata = NEWDATA)))
CIs <- limits.bca(boot)
lines(NEWDATA$x, CIs[4:53,1], col="red")
lines(NEWDATA$x, CIs[4:53,4], col="red")
CIs[1:3,]

I used x + x^2 in the model rather than poly(x,2), because poly is
data dependent, so regression coefficients cannot be used for new
data, and confidence intervals for the coefficients are not meaningful.

Comment - the default is 1000 bootstrap samples; that isn't really
enough for BCa intervals, because the BCa calculations magnify the
Monte Carlo standard error by roughly a factor of two.

Jacob Wegelin wrote:

>Sometimes one might like to obtain pointwise bootstrap bias-corrected,
>accelerated (BCA) confidence intervals for a large number of statistics
>computed from a single dataset.  For instance, one might like to get
>(so as to plot graphically) bootstrap confidence bands for the fitted
>values in a regression model.
>
>(Example: Chiu S et al., Early Acceleration of Head Circumference in
>Children with Fragile X Syndrome and Autism. Journal of Developmental &
>Behavioral Pediatrics 2007;In press. In this paper we plot the
>estimated trajectories of head circumference for two different
>groups, computed by linear mixed-effects models, with confidence bands
>obtained by bootstrap.)
>
>Below is a toy example of how to do this using the "boot" library.
>We obtain BCA CIs for all three regression parameters and for the fitted
>values at 50 levels of the predictor.
>
>set.seed(1234567)
>x<-runif(150)
>y<-2/3 + pi * x^2 + runif(length(x))/2
>plot(x,y)
>DAT<-data.frame(x,y)
>NEWDATA<-data.frame(x=seq(min(x), max(x), length=50))
>library('boot')
>myfn<-function(data, whichrows) {
> TheseData<-data[whichrows,]
> thisLM<-lm( y~poly(x,2), data=TheseData)
> thisFit<-predict(thisLM, newdata=NEWDATA)
> c(
> coef(summary(thisLM))[,"Estimate"]
> , thisFit)
>}
>bootObj<-boot( data=DAT, statistic=myfn, R=1000 )
>names(bootObj)
>dim(bootObj$t)
>sofar<-t(sapply( 1:ncol(bootObj$t), function(thiscolumn) boot.ci(bootObj, type='bca', index=thiscolumn)$bca[4:5] ))
>colnames(sofar)<-c("lo", "hi")
>NEWDATA<-cbind(NEWDATA, sofar[4:nrow(sofar),])
>thecoefs<-sofar[1:3,]
>lines( NEWDATA$x, NEWDATA$lo, col='red')
>lines( NEWDATA$x, NEWDATA$hi, col='red')
>
>Note: To get boot.ci to run with type='bca' it seems necessary to have a
>large value of R.  Otherwise boot.ci returns an error, in my (limited)
>experience.
>
>Thanks in advance for any critiques.  (For instance, is there an easier or more elegant way?)

Caveat - I helped create S+Resample.

========================================================
| Tim Hesterberg       Senior Research Scientist       |
| [hidden email]  Insightful Corp.                |
| (206)802-2319        1700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|                      www.insightful.com/Hesterberg   |
========================================================
Download S+Resample from www.insightful.com/downloads/libraries

Bootstrap short courses:  May 21-22 Philadelphia, Oct 10-11 San Francisco.
                          2-3 May UK, 3-4 Oct UK.
                http://www.insightful.com/services/training.asp

Workshop on resampling for teaching:  May 16 Ohio State
                http://www.causeweb.org/workshop/hesterberg/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.