# When to use bootstrap confidence intervals?

3 messages
Open this post in threaded view
|
Report Content as Inappropriate

## When to use bootstrap confidence intervals?

 Hello, I have a question regarding bootstrap confidence intervals. Suppose we have a data set consisting of single measurements, and that the measurements are independent but the distribution is unknown. If we want a confidence interval for the population mean, when should a bootstrap confidence interval be preferred over the elementary t interval? I was hoping the answer would be "always", but some simple simulations suggest that this is incorrect. I simulated some data and calculated 95% elementary t intervals and 95% bootstrap BCA intervals (with the boot package). I calculated the proportion of confidence intervals lying entirely above the true mean, the proportion entirely below the true mean, and the proportion containing the true mean. I used a normal distribution and a t distribution with 3 df. library(boot) samplemean <- function(x, ind) mean(x[ind]) ci.norm <- function(sample.size, n.samples, mu=0, sigma=1, boot.reps) {    t.under <- 0; t.over <- 0    bca.under <- 0; bca.over <- 0    for (k in 1:n.samples) {      x <- rnorm(sample.size, mu, sigma)      b <- boot(x, samplemean, R = boot.reps)      bci <- boot.ci(b, type="bca")      if (mu < mean(x) - qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))        t.under <- t.under + 1      if (mu > mean(x) + qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))        t.over <- t.over + 1      if (mu < bci\$bca[4]) bca.under <- bca.under + 1      if (mu > bci\$bca[5]) bca.over <- bca.over + 1    }    return(list(t = c(t.under, t.over, n.samples - (t.under + t.over))/n.samples,           bca = c(bca.under, bca.over, n.samples - (bca.under + bca.over))/n.samples)) } ci.t <- function(sample.size, n.samples, df, boot.reps) {    t.under <- 0; t.over <- 0    bca.under <- 0; bca.over <- 0    for (k in 1:n.samples) {      x <- rt(sample.size, df)      b <- boot(x, samplemean, R = boot.reps)      bci <- boot.ci(b, type="bca")      if (0 < mean(x) - qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))        t.under <- t.under + 1      if (0 > mean(x) + qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))        t.over <- t.over + 1      if (0 < bci\$bca[4]) bca.under <- bca.under + 1      if (0 > bci\$bca[5]) bca.over <- bca.over + 1    }    return(list(t = c(t.under, t.over, n.samples - (t.under + t.over))/n.samples,           bca = c(bca.under, bca.over, n.samples - (bca.under + bca.over))/n.samples)) } set.seed(1) ci.norm(sample.size = 10, n.samples = 1000, boot.reps = 1000) \$t [1] 0.019 0.026 0.955 \$bca [1] 0.049 0.059 0.892 ci.norm(sample.size = 20, n.samples = 1000, boot.reps = 1000) \$t [1] 0.030 0.024 0.946 \$bca [1] 0.035 0.037 0.928 ci.t(sample.size = 10, n.samples = 1000, df = 3, boot.reps = 1000) \$t [1] 0.018 0.022 0.960 \$bca [1] 0.055 0.076 0.869 Warning message: In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints ci.t(sample.size = 20, n.samples = 1000, df = 3, boot.reps = 1000) \$t [1] 0.027 0.014 0.959 \$bca [1] 0.054 0.047 0.899 I don't understand the warning message, but for these examples, the ordinary t interval appears to be better than the bootstrap BCA interval. I would really appreciate any recommendations anyone can give on when bootstrap confidence intervals should be used. Thanks, Mark -- Mark Seeto National Acoustic Laboratories, Australian Hearing ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: When to use bootstrap confidence intervals?

 Just based on my limited understanding of bootstrapping and statistics in general, bootstrapping is effective but not magical - you can't reasonably expect any reliable inference to be drawn about the population based on a sample of 10, without any distributional assumptions. Your t interval looks good conditional on the fact that you know what distribution you used to simulate the data.   Mark Seeto wrote Hello, I have a question regarding bootstrap confidence intervals. Suppose we have a data set consisting of single measurements, and that the measurements are independent but the distribution is unknown. If we want a confidence interval for the population mean, when should a bootstrap confidence interval be preferred over the elementary t interval? I was hoping the answer would be "always", but some simple simulations suggest that this is incorrect. I simulated some data and calculated 95% elementary t intervals and 95% bootstrap BCA intervals (with the boot package). I calculated the proportion of confidence intervals lying entirely above the true mean, the proportion entirely below the true mean, and the proportion containing the true mean. I used a normal distribution and a t distribution with 3 df.