bootstrap resampling - simplified

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

bootstrap resampling - simplified

Laszlo
Hello there,

I have a problem concerning bootstrapping in R - especially focusing on the resampling part of it. I try to sum it up in a simplified way so that I would not confuse anybody.

I have a small database consisting of 20 observations (basically numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).

I would like to resample this database many times for the bootstrap process with the following conditions. Firstly, every resampled database should also include 20 observations. Secondly, when selecting a number from the above-mentioned 20 numbers, you can do this selection with replacement. The difficult part comes now: one number can be selected only maximum 5 times. In order to make this clear I show you a couple of examples. So the resampled databases might be like the following ones:

(1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
4 different numbers are chosen (1, 2, 3, 4), each selected - for the maximum possible - 5 times.

(2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
Two numbers - 8 and 6 - selected 5 times (the maximum possible times), number 1 selected 4 times, the others selected less than 4 times.

(3rd database)          1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, 1 chosen for 3 times, number 4 selected twice and number 13 selected only once.

...

Anybody knows how to implement my "tricky" condition into one of the R functions - that one number can be selected only 5 times at most? Are 'boot' and 'bootstrap' packages capable of managing this? I guess they are, I just couldn't figure it out yet...

Thanks very much! Best regards,
Laszlo Bodnar


____________________________________________________________________________________________________
Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy jogilag, szakmailag vagy más módon védett információt tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton történő terjesztése, felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos - címzett(ek)hez történő - eljuttatásáért, valamint semmilyen késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH folytonos munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrap resampling - simplified

djmuseR
Hi:

On Tue, Mar 1, 2011 at 8:22 AM, Bodnar Laszlo EB_HU <
[hidden email]> wrote:

> Hello there,
>
> I have a problem concerning bootstrapping in R - especially focusing on the
> resampling part of it. I try to sum it up in a simplified way so that I
> would not confuse anybody.
>
> I have a small database consisting of 20 observations (basically numbers
> from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
>

To check on the probability of this event happening, I ran the following:
bootmat <- matrix(sample(1:20, 200000, replace = TRUE), nrow = 10000)
sum(apply(bootmat, 1, function(x) any(table(x) >= 5)) )
[1] 492

It's about 0.05. A Q& D 'solution' would be to oversample by at least 5%
(let's do 10% just to be on the safe side) and then pick out the first B of
these. In the above example, we could do 11000 samples instead, and pick out
the first 10000 that meet the criterion:

bootmat <- matrix(sample(1:20, 220000, replace = TRUE), nrow = 11000)
badsamps <- apply(bootmat, 1, function(x) any(tabulate(x) >= 5))
bootfin <- bootmat[-badsamps, ][1:10000, ]

Time:
   user  system elapsed
   0.28    0.00    0.28

(Note 1: Using table instead of tabulate took 4.22 seconds on my machine -
tabulate is much faster.)
(Note 2: In the call above, there were 539 bad samples, so the 5% ballpark
estimate seems plausible.)

This is a simple application of the accept-reject criterion. I don't know
how large 'many' is to you, but 10,000 seems to be a reasonable starting
point. I ran it again for 1,000,000 such samples, and the completion time
was
   user  system elapsed
  36.74    0.31   37.15
so the processing time is of an order a bit larger than linear. If your
simulations are of this magnitude and are to be run repeatedly, you probably
need to write a function to improve the speed and to get rid of the waste
produced by a rejection sampling approach. If this is a one-off deal,
perhaps the above is sufficient.

HTH,
Dennis

> I would like to resample this database many times for the bootstrap process
> with the following conditions. Firstly, every resampled database should also
> include 20 observations. Secondly, when selecting a number from the
> above-mentioned 20 numbers, you can do this selection with replacement. The
> difficult part comes now: one number can be selected only maximum 5 times.
> In order to make this clear I show you a couple of examples. So the
> resampled databases might be like the following ones:
>
> (1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> 4 different numbers are chosen (1, 2, 3, 4), each selected - for the
> maximum possible - 5 times.
>
> (2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> Two numbers - 8 and 6 - selected 5 times (the maximum possible times),
> number 1 selected 4 times, the others selected less than 4 times.
>
> (3rd database)          1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
> Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, 1 chosen
> for 3 times, number 4 selected twice and number 13 selected only once.
>
> ...
>
> Anybody knows how to implement my "tricky" condition into one of the R
> functions - that one number can be selected only 5 times at most? Are 'boot'
> and 'bootstrap' packages capable of managing this? I guess they are, I just
> couldn't figure it out yet...
>
> Thanks very much! Best regards,
> Laszlo Bodnar
>
>
>
> ____________________________________________________________________________________________________
> Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy
> jogilag, szakmailag vagy más módon védett információt tartalmazhat.
> Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése,
> reprodukálása, másolása, vagy egyéb más úton történõ terjesztése,
> felhasználása szigorúan tilos. Amennyiben tévedésbõl kapta meg ezt az
> üzenetet kérjük azonnal értesítse az üzenet küldõjét. Az Erste Bank Hungary
> Zrt. (EBH) nem vállal felelõsséget az információ teljes és pontos -
> címzett(ek)hez történõ - eljuttatásáért, valamint semmilyen késésért,
> kapcsolat megszakadásból eredõ hibáért, vagy az információ felhasználásából
> vagy annak megbízhatatlanságából eredõ kárért.
>
> Az üzenetek EBH-n kívüli küldõje vagy címzettje tudomásul veszi és
> hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH
> folytonos munkamenetének biztosítása érdekében.
>
>
> This e-mail and any attached files are confidential and/...{{dropped:19}}
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrap resampling - simplified

Vokey, John
In reply to this post by Laszlo
On 2011-03-02, at 4:00 AM, [hidden email] wrote:

> Hello there,
>
> I have a problem concerning bootstrapping in R - especially focusing on the resampling part of it. I try to sum it up in a simplified way so that I would not confuse anybody.
>
> I have a small database consisting of 20 observations (basically numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
>
> I would like to resample this database many times for the bootstrap process with the following conditions. Firstly, every resampled database should also include 20 observations. Secondly, when selecting a number from the above-mentioned 20 numbers, you can do this selection with replacement. The difficult part comes now: one number can be selected only maximum 5 times. In order to make this clear I show you a couple of examples. So the resampled databases might be like the following ones:
>
> (1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> 4 different numbers are chosen (1, 2, 3, 4), each selected - for the maximum possible - 5 times.
>
> (2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> Two numbers - 8 and 6 - selected 5 times (the maximum possible times), number 1 selected 4 times, the others selected less than 4 times.
>
> (3rd database)          1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
> Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, 1 chosen for 3 times, number 4 selected twice and number 13 selected only once.
>
> ...
>
> Anybody knows how to implement my "tricky" condition into one of the R functions - that one number can be selected only 5 times at most? Are 'boot' and 'bootstrap' packages capable of managing this? I guess they are, I just couldn't figure it out yet...
>
> Thanks very much! Best regards,
> Laszlo Bodnar

Laszlo,
  Create a vector consisting of 5 of each number.  Then, for each sample, scramble the order of the items in the vector, and select the first 20.


--
Please avoid sending me Word or PowerPoint attachments.
See <http://www.gnu.org/philosophy/no-word-attachments.html>

-Dr. John R. Vokey

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrap resampling - simplified

Jonathan P Daily
I will point out again that sampling a five-fold replicate of 1:20 is not
the same as resampling with replacement, although I made an error in
reporting probabilities - the P(x2 = 1 | x1 = 1) = 4/99 and not 4/100.
When sampling with replacement, P(x2 = 1 | x1 = 1) = P(x2 = 1 | x1 != 1) =
1/20.
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
     - Jubal Early, Firefly

[hidden email] wrote on 03/02/2011 01:05:01 PM:

> [image removed]
>
> Re: [R] bootstrap resampling - simplified
>
> Vokey, John
>
> to:
>
> r-help
>
> 03/02/2011 01:07 PM
>
> Sent by:
>
> [hidden email]
>
> On 2011-03-02, at 4:00 AM, [hidden email] wrote:
>
> > Hello there,
> >
> > I have a problem concerning bootstrapping in R - especially
> focusing on the resampling part of it. I try to sum it up in a
> simplified way so that I would not confuse anybody.
> >
> > I have a small database consisting of 20 observations (basically
> numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
> >
> > I would like to resample this database many times for the
> bootstrap process with the following conditions. Firstly, every
> resampled database should also include 20 observations. Secondly,
> when selecting a number from the above-mentioned 20 numbers, you can
> do this selection with replacement. The difficult part comes now:
> one number can be selected only maximum 5 times. In order to make
> this clear I show you a couple of examples. So the resampled
> databases might be like the following ones:
> >
> > (1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> > 4 different numbers are chosen (1, 2, 3, 4), each selected - for
> the maximum possible - 5 times.
> >
> > (2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> > Two numbers - 8 and 6 - selected 5 times (the maximum possible
> times), number 1 selected 4 times, the others selected less than 4
times.
> >
> > (3rd database)          1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
> > Number 9 chosen for the maximum possible 5 times, number 10, 3, 2,
> 1 chosen for 3 times, number 4 selected twice and number 13 selectedonly
once.

> >
> > ...
> >
> > Anybody knows how to implement my "tricky" condition into one of
> the R functions - that one number can be selected only 5 times at
> most? Are 'boot' and 'bootstrap' packages capable of managing this?
> I guess they are, I just couldn't figure it out yet...
> >
> > Thanks very much! Best regards,
> > Laszlo Bodnar
>
> Laszlo,
>   Create a vector consisting of 5 of each number.  Then, for each
> sample, scramble the order of the items in the vector, and select
> the first 20.
>
>
> --
> Please avoid sending me Word or PowerPoint attachments.
> See <http://www.gnu.org/philosophy/no-word-attachments.html>
>
> -Dr. John R. Vokey
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrap resampling - simplified

Giovanni Petris
In reply to this post by Vokey, John
But this seems to me to be equivalent to sample(rep(1:20, 5), 20),
which I previously suggested and was pointed out to be wrong....

Giovanni

On Wed, 2011-03-02 at 11:05 -0700, Vokey, John wrote:

> On 2011-03-02, at 4:00 AM, [hidden email] wrote:
>
> > Hello there,
> >
> > I have a problem concerning bootstrapping in R - especially focusing on the resampling part of it. I try to sum it up in a simplified way so that I would not confuse anybody.
> >
> > I have a small database consisting of 20 observations (basically numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
> >
> > I would like to resample this database many times for the bootstrap process with the following conditions. Firstly, every resampled database should also include 20 observations. Secondly, when selecting a number from the above-mentioned 20 numbers, you can do this selection with replacement. The difficult part comes now: one number can be selected only maximum 5 times. In order to make this clear I show you a couple of examples. So the resampled databases might be like the following ones:
> >
> > (1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> > 4 different numbers are chosen (1, 2, 3, 4), each selected - for the maximum possible - 5 times.
> >
> > (2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> > Two numbers - 8 and 6 - selected 5 times (the maximum possible times), number 1 selected 4 times, the others selected less than 4 times.
> >
> > (3rd database)          1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
> > Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, 1 chosen for 3 times, number 4 selected twice and number 13 selected only once.
> >
> > ...
> >
> > Anybody knows how to implement my "tricky" condition into one of the R functions - that one number can be selected only 5 times at most? Are 'boot' and 'bootstrap' packages capable of managing this? I guess they are, I just couldn't figure it out yet...
> >
> > Thanks very much! Best regards,
> > Laszlo Bodnar
>
> Laszlo,
>   Create a vector consisting of 5 of each number.  Then, for each sample, scramble the order of the items in the vector, and select the first 20.
>
>
> --
> Please avoid sending me Word or PowerPoint attachments.
> See <http://www.gnu.org/philosophy/no-word-attachments.html>
>
> -Dr. John R. Vokey
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrap resampling - simplified

Bert Gunter
In reply to this post by Jonathan P Daily
Folks:

On Wed, Mar 2, 2011 at 10:32 AM, Jonathan P Daily <[hidden email]> wrote:
> I will point out again that sampling a five-fold replicate of 1:20 is not
> the same as resampling with replacement,

-- Correct. In sampling with replacement from 1:20 there is positive
probability of getting all 1's or all 2's, etc. The poster
specifically said that he wanted 0 probability of such results. So,
obviously, the poster does NOT want to "sample with replacement from
1:20." What he does want (I think) is a re-sample of size n from the
set of all **vectors** of length 20, each element of which is an
integer from 1 to 20, and for which no individual values occur more
than 5 times in the vector. Of course I'm just
interpreting/paraphrasing the original post (if I got it right), but I
think doing so makes the nature of the task clearer: one needs to find
some way to sample with replacement from the space of all such
**sequences**.

I think it is now clear that one may do so by rejection sampling: i.e.
sample with replacement from 1:20 and throw away any sequences that
fail the at most 5 criterion. The sequences that remain are samples of
size 1 from the population of sequences that satisfy the poster's
criteria (in theory, anyway; this might tax a pseudo RNG in practice).
A collection of n such sequences is a bootstrap sample from this
population. I **think** that's what the poster wants -- and what
others have already provided. However, maybe this clarifies why it
works.

If I have made any error in this, **Please** post a message pointing
out my error. I sometimes get confused about this stuff, too.

Cheers,
Bert





 although I made an error in

> reporting probabilities - the P(x2 = 1 | x1 = 1) = 4/99 and not 4/100.
> When sampling with replacement, P(x2 = 1 | x1 = 1) = P(x2 = 1 | x1 != 1) =
> 1/20.
> --------------------------------------
> Jonathan P. Daily
> Technician - USGS Leetown Science Center
> 11649 Leetown Road
> Kearneysville WV, 25430
> (304) 724-4480
> "Is the room still a room when its empty? Does the room,
>  the thing itself have purpose? Or do we, what's the word... imbue it."
>     - Jubal Early, Firefly
>
> [hidden email] wrote on 03/02/2011 01:05:01 PM:
>
>> [image removed]
>>
>> Re: [R] bootstrap resampling - simplified
>>
>> Vokey, John
>>
>> to:
>>
>> r-help
>>
>> 03/02/2011 01:07 PM
>>
>> Sent by:
>>
>> [hidden email]
>>
>> On 2011-03-02, at 4:00 AM, [hidden email] wrote:
>>
>> > Hello there,
>> >
>> > I have a problem concerning bootstrapping in R - especially
>> focusing on the resampling part of it. I try to sum it up in a
>> simplified way so that I would not confuse anybody.
>> >
>> > I have a small database consisting of 20 observations (basically
>> numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
>> >
>> > I would like to resample this database many times for the
>> bootstrap process with the following conditions. Firstly, every
>> resampled database should also include 20 observations. Secondly,
>> when selecting a number from the above-mentioned 20 numbers, you can
>> do this selection with replacement. The difficult part comes now:
>> one number can be selected only maximum 5 times. In order to make
>> this clear I show you a couple of examples. So the resampled
>> databases might be like the following ones:
>> >
>> > (1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
>> > 4 different numbers are chosen (1, 2, 3, 4), each selected - for
>> the maximum possible - 5 times.
>> >
>> > (2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
>> > Two numbers - 8 and 6 - selected 5 times (the maximum possible
>> times), number 1 selected 4 times, the others selected less than 4
> times.
>> >
>> > (3rd database)          1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
>> > Number 9 chosen for the maximum possible 5 times, number 10, 3, 2,
>> 1 chosen for 3 times, number 4 selected twice and number 13 selectedonly
> once.
>> >
>> > ...
>> >
>> > Anybody knows how to implement my "tricky" condition into one of
>> the R functions - that one number can be selected only 5 times at
>> most? Are 'boot' and 'bootstrap' packages capable of managing this?
>> I guess they are, I just couldn't figure it out yet...
>> >
>> > Thanks very much! Best regards,
>> > Laszlo Bodnar
>>
>> Laszlo,
>>   Create a vector consisting of 5 of each number.  Then, for each
>> sample, scramble the order of the items in the vector, and select
>> the first 20.
>>
>>
>> --
>> Please avoid sending me Word or PowerPoint attachments.
>> See <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>> -Dr. John R. Vokey
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: bootstrap resampling - simplified

Jonathan P Daily
I apologize if I was not clear in my response. I only mentioned x1, x2 in
my example, but I did not clarify that I also knew that P(x6 = 1 | x1..5 =
1) = 0 in the original request. I also see that if he meant that he wanted
to sample with replacement from the set of sequences that sample(rep(1:20,
5), 20) is fine for generating said sequences. My interpretation was that
the sequences themselves should be sampling with replacement until
frequency hits 5, whereupon it is not replaced. Hence my suggestion of:

bigsamp <- sample(1:20, 100, T)
idx <- sort(unlist(sapply(1:20, function(x) which(bigsamp ==
x)[1:5])))[1:20]
samp <- bigsamp[idx]

I apologize for my lack of clarity, though after reading the original post
I'm not sure which solution the OP was looking for.

Cheers,
Jon
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
     - Jubal Early, Firefly

Bert Gunter <[hidden email]> wrote on 03/02/2011 02:42:40 PM:

> [image removed]
>
> Re: [R] bootstrap resampling - simplified
>
> Bert Gunter
>
> to:
>
> Jonathan P Daily
>
> 03/02/2011 02:42 PM
>
> Cc:
>
> "Vokey, John", r-help, r-help-bounces
>
> Folks:
>
> On Wed, Mar 2, 2011 at 10:32 AM, Jonathan P Daily <[hidden email]>
wrote:
> > I will point out again that sampling a five-fold replicate of 1:20 is
not

> > the same as resampling with replacement,
>
> -- Correct. In sampling with replacement from 1:20 there is positive
> probability of getting all 1's or all 2's, etc. The poster
> specifically said that he wanted 0 probability of such results. So,
> obviously, the poster does NOT want to "sample with replacement from
> 1:20." What he does want (I think) is a re-sample of size n from the
> set of all **vectors** of length 20, each element of which is an
> integer from 1 to 20, and for which no individual values occur more
> than 5 times in the vector. Of course I'm just
> interpreting/paraphrasing the original post (if I got it right), but I
> think doing so makes the nature of the task clearer: one needs to find
> some way to sample with replacement from the space of all such
> **sequences**.
>
> I think it is now clear that one may do so by rejection sampling: i.e.
> sample with replacement from 1:20 and throw away any sequences that
> fail the at most 5 criterion. The sequences that remain are samples of
> size 1 from the population of sequences that satisfy the poster's
> criteria (in theory, anyway; this might tax a pseudo RNG in practice).
> A collection of n such sequences is a bootstrap sample from this
> population. I **think** that's what the poster wants -- and what
> others have already provided. However, maybe this clarifies why it
> works.
>
> If I have made any error in this, **Please** post a message pointing
> out my error. I sometimes get confused about this stuff, too.
>
> Cheers,
> Bert
>
>
>
>
>
>  although I made an error in
> > reporting probabilities - the P(x2 = 1 | x1 = 1) = 4/99 and not 4/100.
> > When sampling with replacement, P(x2 = 1 | x1 = 1) = P(x2 = 1 | x1 !=
1) =
> > 1/20.
> > --------------------------------------
> > Jonathan P. Daily
> > Technician - USGS Leetown Science Center
> > 11649 Leetown Road
> > Kearneysville WV, 25430
> > (304) 724-4480
> > "Is the room still a room when its empty? Does the room,
> >  the thing itself have purpose? Or do we, what's the word... imbue
it."

> >     - Jubal Early, Firefly
> >
> > [hidden email] wrote on 03/02/2011 01:05:01 PM:
> >
> >> [image removed]
> >>
> >> Re: [R] bootstrap resampling - simplified
> >>
> >> Vokey, John
> >>
> >> to:
> >>
> >> r-help
> >>
> >> 03/02/2011 01:07 PM
> >>
> >> Sent by:
> >>
> >> [hidden email]
> >>
> >> On 2011-03-02, at 4:00 AM, [hidden email] wrote:
> >>
> >> > Hello there,
> >> >
> >> > I have a problem concerning bootstrapping in R - especially
> >> focusing on the resampling part of it. I try to sum it up in a
> >> simplified way so that I would not confuse anybody.
> >> >
> >> > I have a small database consisting of 20 observations (basically
> >> numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20).
> >> >
> >> > I would like to resample this database many times for the
> >> bootstrap process with the following conditions. Firstly, every
> >> resampled database should also include 20 observations. Secondly,
> >> when selecting a number from the above-mentioned 20 numbers, you can
> >> do this selection with replacement. The difficult part comes now:
> >> one number can be selected only maximum 5 times. In order to make
> >> this clear I show you a couple of examples. So the resampled
> >> databases might be like the following ones:
> >> >
> >> > (1st database)          1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4
> >> > 4 different numbers are chosen (1, 2, 3, 4), each selected - for
> >> the maximum possible - 5 times.
> >> >
> >> > (2nd database)          1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1
> >> > Two numbers - 8 and 6 - selected 5 times (the maximum possible
> >> times), number 1 selected 4 times, the others selected less than 4
> > times.
> >> >
> >> > (3rd database)          1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1
> >> > Number 9 chosen for the maximum possible 5 times, number 10, 3, 2,
> >> 1 chosen for 3 times, number 4 selected twice and number 13
selectedonly

> > once.
> >> >
> >> > ...
> >> >
> >> > Anybody knows how to implement my "tricky" condition into one of
> >> the R functions - that one number can be selected only 5 times at
> >> most? Are 'boot' and 'bootstrap' packages capable of managing this?
> >> I guess they are, I just couldn't figure it out yet...
> >> >
> >> > Thanks very much! Best regards,
> >> > Laszlo Bodnar
> >>
> >> Laszlo,
> >>   Create a vector consisting of 5 of each number.  Then, for each
> >> sample, scramble the order of the items in the vector, and select
> >> the first 20.
> >>
> >>
> >> --
> >> Please avoid sending me Word or PowerPoint attachments.
> >> See <http://www.gnu.org/philosophy/no-word-attachments.html>
> >>
> >> -Dr. John R. Vokey
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 467-7374
> http://devo.gene.com/groups/devo/depts/ncb/home.shtml

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.