# Help with boot()

 Dear List: I'm trying to use the boot function to estimate some standard errors. I actually programmed a bootstrap using some homebrew code and it worked fine. But, I am trying to use the more efficient boot function. I have placed some sample data for replication of my problem at the bottom of this email. For the sample problem, I have 10 subjects each with 5 observations Y_t = (t_1, ..., t_5). Consider these 'longitudinal' data. So, I use reshape to put into the long format in order to regress the observations onto time, a time-varying covariate. I do the regression for each individual separately: Y_{t} = \mu + \beta(time) + \epsilon_{t} To get the statistic of interest (sigma_theta), I do the following with the original data: theta <- numeric(10) sigma <- numeric(10) for(i in 1:10){      tmp      <- subset(long, ID==i)      tmp.lm   <- lm(obs ~ time, tmp)      theta[i] <- coef(tmp.lm)[2]      sigma[i] <- sum((tmp.lm$residuals)^2)/ (dim(tmp)[1] -2) } sigma_theta <- var(theta) - ( mean(sigma) / 10) # This is my point estimate All that I do is perform an OLS regression for each subject individually and use these estimates to derive a measure of between-unit variability for the parameter \beta. Now, I want to put a bootstrap confidence interval around sigma_theta using the values from the empirical distribution. I realize I could be using lmer() here, but I am experiementing with a method for getting some variance components. Simply programming this was not too difficult, albeit maybe a little hokey. I simply resampled from the original data (DD), reshaped into the long format, and then repeated the loop above 4000 times. It was basically the following in a loop: p <- sample(10, replace=TRUE) pD <- DD[p,] pD$ID <- seq(1:10) # This was needed so reshape doesn't give error regarding duplicate IDs plong <- reshape(pD, idvar='ID', varying=list(c('t1','t2','t3','t4','t5')), v.names='obs', direction='long') Now, I have read through Bootstrapping regression models by J. Fox and of course the boot help as well as a few others online. I see how the boot function works for the most part and have been successful with basic things like means and medians. But, what is confusing me, and maybe I am overthinking this, is how to work with resampling when the data are in the long format. >From what I understand, I need to write the following function to compute my statistic of interest: my.fun <- function(DD,p){      E <- DD[p,]      var(theta) - ( mean(sigma) / 10)      } Then I do something like boot(DD, my.fun, 4000) Well, those of you experienced with this see this is not working. This is my effort to illustrate what I have done and see if anyone can offer a little didactic advice. Thanks, Harold R 2.2.0 Windows XP DD <- data.frame(ID= c(1,2,3,4,5,6,7,8,9,10), t1=c(61,65,57,46,47,43,53,72,53,72), t2 = c(72,85,68,74,85,58,62,96,54,98),t3=c(118,129,130,116,103,109,82,117,87, 114), t4=c(130,148,143,124,117,133,112,129,120,144) ,t5=c(176,174,201,157,148,152,156,154,138,177), z=c(170,194,187,156,155,150,138,154,149,167)) long <- reshape(DD, idvar='ID', varying=list(c('t1','t2','t3','t4','t5')), v.names='obs', direction='long')         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html