Quantcast

How to do bootstrap for the complex sample design?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

How to do bootstrap for the complex sample design?

Fei xu

Hello;
 
Our survey is structured as : To be investigated area is divided into 6 regions,
within each region, one urban community and one rural community are randomly selected,
then samples are randomly drawn from each selected uran and rural community.  
 
The problems is that in urban/rural stratum, we only have one sample.
In this case, how to do bootstrap?
 
Any comments or hints are greatly appreciated!
 
Faye        
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do bootstrap for the complex sample design?

Robert A LaBudde
At 01:38 AM 11/4/2010, Fei xu wrote:

>Hello;
>
>Our survey is structured as : To be investigated area is divided
>into 6 regions,
>within each region, one urban community and one rural community are
>randomly selected,
>then samples are randomly drawn from each selected uran and rural community.
>
>The problems is that in urban/rural stratum, we only have one sample.
>In this case, how to do bootstrap?
>
>Any comments or hints are greatly appreciated!
>
>Faye

Just make a table of your data, with each row corresponding to a
measurement. You columns will be Region, UrbanCommunity,
RuralCommunity and your response variables.

Bootstrap resampling is just generating random row indices into this
table, with replacement. I.e.,

index<- sample(1:N, N, replace=TRUE)

Then your resample is myTable[index,].

Because you chose UrbanCommunity and RuralCommunity randomly, this
shouldn't be a problem. The fact that you choose a subsample size of
1 means you won't be able to estimate within-region variances unless
you make some serious assumptions (e.g., UrbanCommunity effect
independent of Region effect).

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [hidden email]
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do bootstrap for the complex sample design?

Tim Hesterberg-2
In reply to this post by Fei xu
Faye wrote:
>Our survey is structured as : To be investigated area is divided into
>6 regions, within each region, one urban community and one rural
>community are randomly selected, then samples are randomly drawn from
>each selected uran and rural community.
>
>The problems is that in urban/rural stratum, we only have one sample.
>In this case, how to do bootstrap?

You are lucky that your sample size is 1.  If it were 2 you would
probably have proceeded without realizing that the answers were wrong.

Suppose you had two samples in each stratum.  If you proceed naturally,
drawing bootstrap samples of size 2 from each stratum, this would
underestimate variability by a factor of 2.

In general the ordinary nonparametric bootstrap estimates of variability
are biased downward by a factor of (n-1)/n -- exactly for the mean,
approximately for other statistics.  In multiple-sample and stratified
situations, the bias depends on the stratum sizes.

Three remedies are:
* draw bootstrap samples of size n-1
* "bootknife" sampling - omit one observation (a jackknife sample), then
  draw a bootstrap sample of size n from that
* bootstrap from a kernel density estimate, with kernel covariance equal
  to empirical covariance (with divisor n-1) / n.
The latter two are described in
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

All three are undefined for samples of size 1.  You need to go to some
other bootstrap, e.g. a parametric bootstrap with variability estimated
from other data.

Tim Hesterberg

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do bootstrap for the complex sample design?

Thomas Lumley-2
On Fri, Nov 5, 2010 at 3:51 AM, Tim Hesterberg <[hidden email]> wrote:

> Faye wrote:
>>Our survey is structured as : To be investigated area is divided into
>>6 regions, within each region, one urban community and one rural
>>community are randomly selected, then samples are randomly drawn from
>>each selected uran and rural community.
>>
>>The problems is that in urban/rural stratum, we only have one sample.
>>In this case, how to do bootstrap?
>
> You are lucky that your sample size is 1.  If it were 2 you would
> probably have proceeded without realizing that the answers were wrong.
>
> Suppose you had two samples in each stratum.  If you proceed naturally,
> drawing bootstrap samples of size 2 from each stratum, this would
> underestimate variability by a factor of 2.
>
> In general the ordinary nonparametric bootstrap estimates of variability
> are biased downward by a factor of (n-1)/n -- exactly for the mean,
> approximately for other statistics.  In multiple-sample and stratified
> situations, the bias depends on the stratum sizes.
>
> Three remedies are:
> * draw bootstrap samples of size n-1
> * "bootknife" sampling - omit one observation (a jackknife sample), then
>  draw a bootstrap sample of size n from that
> * bootstrap from a kernel density estimate, with kernel covariance equal
>  to empirical covariance (with divisor n-1) / n.
> The latter two are described in
> Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930.
> http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf
>
> All three are undefined for samples of size 1.  You need to go to some
> other bootstrap, e.g. a parametric bootstrap with variability estimated
> from other data.
>

And the 'survey' package supplies the first option. (It also supplies
a bootstrap sample of size n that allows finite population
corrections, designed for situations with a large n and a high
sampling fraction, such as some business surveys.)

With a sample size of 1 per stratum there are no design-unbiased
estimators of the standard error, so as others have said you need
external data.

       -thomas


--
Thomas Lumley
Professor of Biostatistics
University of Auckland

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: How to do bootstrap for the complex sample design?

Fei xu

Dear Professor Lumley;
 
Thank you so much for your invaluable advice!
 
I will digest your advice and try different methods.
 
Great thanks again!
 
Faye
 

> Date: Fri, 5 Nov 2010 08:24:00 +1300
> Subject: Re: [R] How to do bootstrap for the complex sample design?
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]; [hidden email]
>
> On Fri, Nov 5, 2010 at 3:51 AM, Tim Hesterberg <[hidden email]> wrote:
> > Faye wrote:
> >>Our survey is structured as : To be investigated area is divided into
> >>6 regions, within each region, one urban community and one rural
> >>community are randomly selected, then samples are randomly drawn from
> >>each selected uran and rural community.
> >>
> >>The problems is that in urban/rural stratum, we only have one sample.
> >>In this case, how to do bootstrap?
> >
> > You are lucky that your sample size is 1.  If it were 2 you would
> > probably have proceeded without realizing that the answers were wrong.
> >
> > Suppose you had two samples in each stratum.  If you proceed naturally,
> > drawing bootstrap samples of size 2 from each stratum, this would
> > underestimate variability by a factor of 2.
> >
> > In general the ordinary nonparametric bootstrap estimates of variability
> > are biased downward by a factor of (n-1)/n -- exactly for the mean,
> > approximately for other statistics.  In multiple-sample and stratified
> > situations, the bias depends on the stratum sizes.
> >
> > Three remedies are:
> > * draw bootstrap samples of size n-1
> > * "bootknife" sampling - omit one observation (a jackknife sample), then
> >  draw a bootstrap sample of size n from that
> > * bootstrap from a kernel density estimate, with kernel covariance equal
> >  to empirical covariance (with divisor n-1) / n.
> > The latter two are described in
> > Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930.
> > http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf
> >
> > All three are undefined for samples of size 1.  You need to go to some
> > other bootstrap, e.g. a parametric bootstrap with variability estimated
> > from other data.
> >
>
> And the 'survey' package supplies the first option. (It also supplies
> a bootstrap sample of size n that allows finite population
> corrections, designed for situations with a large n and a high
> sampling fraction, such as some business surveys.)
>
> With a sample size of 1 per stratum there are no design-unbiased
> estimators of the standard error, so as others have said you need
> external data.
>
> -thomas
>
>
> --
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland
     
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...