Help with boot()

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Help with boot()

Doran, Harold
Dear List:

I'm trying to use the boot function to estimate some standard errors. I
actually programmed a bootstrap using some homebrew code and it worked
fine. But, I am trying to use the more efficient boot function. I have
placed some sample data for replication of my problem at the bottom of
this email. For the sample problem, I have 10 subjects each with 5
observations Y_t = (t_1, ..., t_5). Consider these 'longitudinal' data.
So, I use reshape to put into the long format in order to regress the
observations onto time, a time-varying covariate. I do the regression
for each individual separately:

Y_{t} = \mu + \beta(time) + \epsilon_{t}

To get the statistic of interest (sigma_theta), I do the following with
the original data:

theta <- numeric(10)
sigma <- numeric(10)
for(i in 1:10){
     tmp      <- subset(long, ID==i)
     tmp.lm   <- lm(obs ~ time, tmp)
     theta[i] <- coef(tmp.lm)[2]
     sigma[i] <- sum((tmp.lm$residuals)^2)/ (dim(tmp)[1] -2)
sigma_theta <- var(theta) - ( mean(sigma) / 10) # This is my point

All that I do is perform an OLS regression for each subject individually
and use these estimates to derive a measure of between-unit variability
for the parameter \beta. Now, I want to put a bootstrap confidence
interval around sigma_theta using the values from the empirical
distribution. I realize I could be using lmer() here, but I am
experiementing with a method for getting some variance components.

Simply programming this was not too difficult, albeit maybe a little
hokey. I simply resampled from the original data (DD), reshaped into the
long format, and then repeated the loop above 4000 times. It was
basically the following in a loop:

p  <-  sample(10, replace=TRUE)
pD <- DD[p,]
pD$ID <- seq(1:10) # This was needed so reshape doesn't give error
regarding duplicate IDs
plong <- reshape(pD, idvar='ID',
varying=list(c('t1','t2','t3','t4','t5')), v.names='obs',

Now, I have read through Bootstrapping regression models by J. Fox and
of course the boot help as well as a few others online. I see how the
boot function works for the most part and have been successful with
basic things like means and medians. But, what is confusing me, and
maybe I am overthinking this, is how to work with resampling when the
data are in the long format.

>From what I understand, I need to write the following function to
compute my statistic of interest: <- function(DD,p){
     E <- DD[p,]
     var(theta) - ( mean(sigma) / 10)

Then I do something like

boot(DD,, 4000)

Well, those of you experienced with this see this is not working. This
is my effort to illustrate what I have done and see if anyone can offer
a little didactic advice.

R 2.2.0
Windows XP

DD <- data.frame(ID= c(1,2,3,4,5,6,7,8,9,10),
t2 =

long <- reshape(DD, idvar='ID',
varying=list(c('t1','t2','t3','t4','t5')), v.names='obs',

        [[alternative HTML version deleted]]

[hidden email] mailing list
PLEASE do read the posting guide!