bootstrapping respecting subject level information

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

bootstrapping respecting subject level information

solcita
Hi there,

This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
 
I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:

Subject = rep(c("S1","S2","S3","S4"),4)
Num     = rep(c("singular","plural"),8)
Gram    = rep(c("gram","gram","ungram","ungram"),4)
RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
data    = data.frame(Subject,Num,Gram,RT)

This is the code I used to get the empirical interaction value:

summary(lm(RT ~ Num*Gram, data=data))

As you can see, the interaction between my two factors is -348. I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:

#Function to create the statistic to be boostrapped
boot.huber <- function(data, indices) {
data <- data[indices, ] #select obs. in bootstrap sample
mod <- lm(RT ~ Num*Gram, data=data)
coefficients(mod)       #return coefficient vector
}

#Generate bootstrap estimate
data.boot <- boot(data, boot.huber, 1999)

#Get confidence interval
boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction

My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1, not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)

Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?

Thanks a lot for your help/advice!
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrapping respecting subject level information

Bart Joosen
This post has NOT been accepted by the mailing list yet.
See the strata argument in ?boot.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrapping respecting subject level information

David Winsemius
In reply to this post by solcita

On Jul 3, 2013, at 7:19 PM, Sol Lago wrote:

> Hi there,
>
> This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
>
> I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:
>
> Subject = rep(c("S1","S2","S3","S4"),4)
> Num     = rep(c("singular","plural"),8)
> Gram    = rep(c("gram","gram","ungram","ungram"),4)
> RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
> data    = data.frame(Subject,Num,Gram,RT)
>
> This is the code I used to get the empirical interaction value:
>
> summary(lm(RT ~ Num*Gram, data=data))
>
> As you can see, the interaction between my two factors is -348.

That depends on what you mean by "the interaction between my two factors". It is almost never a good idea to attempt interpretation of interaction coefficients, and is always preferable to check the predictions of hte model.

> I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
>
> #Function to create the statistic to be boostrapped
> boot.huber <- function(data, indices) {
> data <- data[indices, ] #select obs. in bootstrap sample
> mod <- lm(RT ~ Num*Gram, data=data)
> coefficients(mod)       #return coefficient vector
> }
>
> #Generate bootstrap estimate
> data.boot <- boot(data, boot.huber, 1999)
>
> #Get confidence interval
> boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
>
> My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1,

What does that mean?

> not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)

It's doing it by selecting randomly entire rows. It is not "shuffling within rows" for selected subjects.
>
> Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?

It would be doing so because that is the way you set up the indexing. The column ordering tof the data within subjects is not permuted.

I do think you are beyond your understanding of the statstical principles that you are attempting to use and would be safer to consult with a statsitician.


--
David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrapping respecting subject level information

Andrew Robinson-6
In reply to this post by solcita
I'd like to preface this answer by suggesting that if you have multiple
measurements within subjects then you should possibly be thinking about
using mixed-effects models.  Here you have a balanced design and seem
to be thinking about a constrained bootstrap, but I don't know whether
the resulting distribution will have the right properties - others on
the list may.

That said, I think that this needs the 'strata' argument of the boot function.
See the help.  Something like

data.boot <- boot(data, boot.huber, 1999, strata = Subject)

(not tested)

Cheers

Andrew

On Thu, Jul 4, 2013 at 12:19 PM, Sol Lago <[hidden email]> wrote:

> Hi there,
>
> This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
>
> I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:
>
> Subject = rep(c("S1","S2","S3","S4"),4)
> Num     = rep(c("singular","plural"),8)
> Gram    = rep(c("gram","gram","ungram","ungram"),4)
> RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
> data    = data.frame(Subject,Num,Gram,RT)
>
> This is the code I used to get the empirical interaction value:
>
> summary(lm(RT ~ Num*Gram, data=data))
>
> As you can see, the interaction between my two factors is -348. I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
>
> #Function to create the statistic to be boostrapped
> boot.huber <- function(data, indices) {
> data <- data[indices, ] #select obs. in bootstrap sample
> mod <- lm(RT ~ Num*Gram, data=data)
> coefficients(mod)       #return coefficient vector
> }
>
> #Generate bootstrap estimate
> data.boot <- boot(data, boot.huber, 1999)
>
> #Get confidence interval
> boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
>
> My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1, not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)
>
> Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?
>
> Thanks a lot for your help/advice!
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Andrew Robinson
Deputy Director, CEBRA
Senior Lecturer in Applied Statistics                      Tel: +61-3-8344-6410
Department of Mathematics and Statistics            Fax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: [hidden email]    Website: http://www.ms.unimelb.edu.au

FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
SPuR: http://www.ms.unimelb.edu.au/spuRs/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrapping respecting subject level information

Andrew Robinson-6
In reply to this post by David Winsemius
I think that in the case of a 2*2 balanced, replicated design such as
this one, interpreting the interaction should be safe.

Cheers

Andrew

On Fri, Jul 5, 2013 at 9:38 AM, David Winsemius <[hidden email]> wrote:

>
> On Jul 3, 2013, at 7:19 PM, Sol Lago wrote:
>
>> Hi there,
>>
>> This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
>>
>> I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:
>>
>> Subject = rep(c("S1","S2","S3","S4"),4)
>> Num     = rep(c("singular","plural"),8)
>> Gram    = rep(c("gram","gram","ungram","ungram"),4)
>> RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
>> data    = data.frame(Subject,Num,Gram,RT)
>>
>> This is the code I used to get the empirical interaction value:
>>
>> summary(lm(RT ~ Num*Gram, data=data))
>>
>> As you can see, the interaction between my two factors is -348.
>
> That depends on what you mean by "the interaction between my two factors". It is almost never a good idea to attempt interpretation of interaction coefficients, and is always preferable to check the predictions of hte model.
>
>> I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
>>
>> #Function to create the statistic to be boostrapped
>> boot.huber <- function(data, indices) {
>> data <- data[indices, ] #select obs. in bootstrap sample
>> mod <- lm(RT ~ Num*Gram, data=data)
>> coefficients(mod)       #return coefficient vector
>> }
>>
>> #Generate bootstrap estimate
>> data.boot <- boot(data, boot.huber, 1999)
>>
>> #Get confidence interval
>> boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
>>
>> My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1,
>
> What does that mean?
>
>> not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)
>
> It's doing it by selecting randomly entire rows. It is not "shuffling within rows" for selected subjects.
>>
>> Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?
>
> It would be doing so because that is the way you set up the indexing. The column ordering tof the data within subjects is not permuted.
>
> I do think you are beyond your understanding of the statstical principles that you are attempting to use and would be safer to consult with a statsitician.
>
>
> --
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Andrew Robinson
Deputy Director, CEBRA
Senior Lecturer in Applied Statistics                      Tel: +61-3-8344-6410
Department of Mathematics and Statistics            Fax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: [hidden email]    Website: http://www.ms.unimelb.edu.au

FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
SPuR: http://www.ms.unimelb.edu.au/spuRs/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrapping respecting subject level information

Joshua Wiley-2
In reply to this post by solcita
Hi,

It is not the easiest to follow code, but when I was working at UCLA,
I wrote a page demonstrating a multilevel bootstrap, where I use a two
stage sampler, (re)sampling at each level.  In your case, could be
first draw subjects, then draw observations within subjects.  A strata
only option does not resample all sources of variability, which are:

1) which subjects you get and
2) which observations within those

The page is here: http://www.ats.ucla.edu/stat/r/dae/melogit.htm

As a side note, it demonstrates a mixed effects model in R, although
as I mentioned it is not geared for beginners.

Cheers,

Josh



On Wed, Jul 3, 2013 at 7:19 PM, Sol Lago <[hidden email]> wrote:

> Hi there,
>
> This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
>
> I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:
>
> Subject = rep(c("S1","S2","S3","S4"),4)
> Num     = rep(c("singular","plural"),8)
> Gram    = rep(c("gram","gram","ungram","ungram"),4)
> RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
> data    = data.frame(Subject,Num,Gram,RT)
>
> This is the code I used to get the empirical interaction value:
>
> summary(lm(RT ~ Num*Gram, data=data))
>
> As you can see, the interaction between my two factors is -348. I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
>
> #Function to create the statistic to be boostrapped
> boot.huber <- function(data, indices) {
> data <- data[indices, ] #select obs. in bootstrap sample
> mod <- lm(RT ~ Num*Gram, data=data)
> coefficients(mod)       #return coefficient vector
> }
>
> #Generate bootstrap estimate
> data.boot <- boot(data, boot.huber, 1999)
>
> #Get confidence interval
> boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
>
> My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1, not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)
>
> Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?
>
> Thanks a lot for your help/advice!
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrapping respecting subject level information

Andrew Robinson-6
Josh's comment prompted me to check mty go-to reference. Davison and
Hinckley (1997 Section 3.8) recommend sampling the Subjects, but not
within the Subjects.

Cheers

Andrew

On Thu, Jul 04, 2013 at 05:53:58PM -0700, Joshua Wiley wrote:

> Hi,
>
> It is not the easiest to follow code, but when I was working at UCLA,
> I wrote a page demonstrating a multilevel bootstrap, where I use a two
> stage sampler, (re)sampling at each level.  In your case, could be
> first draw subjects, then draw observations within subjects.  A strata
> only option does not resample all sources of variability, which are:
>
> 1) which subjects you get and
> 2) which observations within those
>
> The page is here: http://www.ats.ucla.edu/stat/r/dae/melogit.htm
>
> As a side note, it demonstrates a mixed effects model in R, although
> as I mentioned it is not geared for beginners.
>
> Cheers,
>
> Josh
>
>
>
> On Wed, Jul 3, 2013 at 7:19 PM, Sol Lago <[hidden email]> wrote:
> > Hi there,
> >
> > This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
> >
> > I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:
> >
> > Subject = rep(c("S1","S2","S3","S4"),4)
> > Num     = rep(c("singular","plural"),8)
> > Gram    = rep(c("gram","gram","ungram","ungram"),4)
> > RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
> > data    = data.frame(Subject,Num,Gram,RT)
> >
> > This is the code I used to get the empirical interaction value:
> >
> > summary(lm(RT ~ Num*Gram, data=data))
> >
> > As you can see, the interaction between my two factors is -348. I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
> >
> > #Function to create the statistic to be boostrapped
> > boot.huber <- function(data, indices) {
> > data <- data[indices, ] #select obs. in bootstrap sample
> > mod <- lm(RT ~ Num*Gram, data=data)
> > coefficients(mod)       #return coefficient vector
> > }
> >
> > #Generate bootstrap estimate
> > data.boot <- boot(data, boot.huber, 1999)
> >
> > #Get confidence interval
> > boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
> >
> > My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1, not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)
> >
> > Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?
> >
> > Thanks a lot for your help/advice!
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://joshuawiley.com/
> Senior Analyst - Elkhart Group Ltd.
> http://elkhartgroup.com
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Andrew Robinson  
Deputy Director, ACERA
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia               (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Methods of Statistical Model Estimation (CRC, 2013)
http://www.crcpress.com/product/isbn/9781439858028
Forest Analytics with R (Springer, 2011)
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009):
http://www.ms.unimelb.edu.au/spuRs/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.