random sampling with levels and with replacement

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

random sampling with levels and with replacement

taby gathoni
Dear all,
i have a dataset of about 400 records , with a variable that has  two levels 40 bad and 360 good among other variables,how do i come up  with10 random samples that have the composition of as the main sample  but maintaining the 40 bad 360 good with replacement, i recently discovered that my random samples generated dont maintain the ratio. My code is as  :

mysample <- final[sample(1:nrow(final), 400,replace=TRUE),]

does not give me the ratio of 40 bad and 360 good can anyone give me some pointers please?



Thanks,
Taby




        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: random sampling with levels and with replacement

Daniel Malter
If you want perfect equality, split the data in good and bad and sample from the two samples individually.

On average, however, random sampling from the entire data will reproduce the proportion of good and bad in the data.

hth,
Daniel

Reply | Threaded
Open this post in threaded view
|

Re: random sampling with levels and with replacement

taby gathoni

Thanks Daniel,
Its a long way but it will work.









--- On Fri, 4/8/11, Daniel Malter <[hidden email]> wrote:

From: Daniel Malter <[hidden email]>
Subject: Re: [R] random sampling with levels and with replacement
To: [hidden email]
Date: Friday, April 8, 2011, 10:08 AM

If you want perfect equality, split the data in good and bad and sample from
the two samples individually.

On average, however, random sampling from the entire data will reproduce the
proportion of good and bad in the data.

hth,
Daniel



--
View this message in context: http://r.789695.n4.nabble.com/random-sampling-with-levels-and-with-replacement-tp3435494p3435592.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Odp: random sampling with levels and with replacement

PIKAL Petr
In reply to this post by taby gathoni
Hi

[hidden email] napsal dne 08.04.2011 09:31:44:

> Dear all,
> i have a dataset of about 400 records , with a variable that has  two
levels
> 40 bad and 360 good among other variables,how do i come up  with10
random
> samples that have the composition of as the main sample  but maintaining
the
> 40 bad 360 good with replacement, i recently discovered that my random
samples
> generated dont maintain the ratio. My code is as  :
>
> mysample <- final[sample(1:nrow(final), 400,replace=TRUE),]
>
> does not give me the ratio of 40 bad and 360 good can anyone give me
some
> pointers please?

If you sample 400 items with replacement 400 times you will only
accidentally get exact proportion of good and bad. Consider that in each
sample your chance to get bad one is 40/360 but it does not mean that from
400 random picks you will get exactly 40 bad items.

If you just want shuffle your rows use sampling without replacement.

mysample <- final[sample(1:nrow(final), 400),]

In that case you get the same data but with random row order.

But if you want to do sample with replacement you will get on average the
proportion of good and bad items. You can check it e.g. by

x<-c(rep("g", 360), rep("b",40))
res<-rep(NA, 1000)
for( i in 1:1000) {

y<-table(sample(x,400, replace=T))
res[i]<-y[1]/y[2]
hist(res)
abline(v=40/360, col=2)
}

Regards
Petr



>
>
>
> Thanks,
> Taby
>
>
>
>
>    [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: random sampling with levels and with replacement

Andreas Borg-2
In reply to this post by taby gathoni
Hi,

I am not perfectly sure what you want to do, but here is what I would do
to maintain good/bad ratio in the sample (as Daniel posted, split the
data and sample from the groups):

df <- data.frame(V1 = 1:400, V2 = c(rep("good",360), rep("bad",40)))
isGood <- which(df$V2=="good")
isBad <- which(df$V2=="bad")
sampleGood <- df[sample(isGood, replace=TRUE),]
sampleBad <- df[sample(isBad, replace=TRUE),]
summary(rbind(sampleGood, sampleBad))

Please include a more specific example with test data (for "final" in
this case) next time.

Best regards,

Andreas


taby gathoni schrieb:

> Dear all,
> i have a dataset of about 400 records , with a variable that has  two levels 40 bad and 360 good among other variables,how do i come up  with10 random samples that have the composition of as the main sample  but maintaining the 40 bad 360 good with replacement, i recently discovered that my random samples generated dont maintain the ratio. My code is as  :
>
> mysample <- final[sample(1:nrow(final), 400,replace=TRUE),]
>
> does not give me the ratio of 40 bad and 360 good can anyone give me some pointers please?
>
>
>
> Thanks,
> Taby
>
>
>
>
> [[alternative HTML version deleted]]
>
>  
> ------------------------------------------------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>  


--
Andreas Borg
Medizinische Informatik

UNIVERSITÄTSMEDIZIN
der Johannes Gutenberg-Universität
Institut für Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Straße 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: [hidden email]

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: random sampling with levels and with replacement

taby gathoni



Andreas,

Thanks alot. I  combined below and other suggestions given on r-help and it worked.







--- On Fri, 4/8/11, Andreas Borg <[hidden email]> wrote:

From: Andreas Borg <[hidden email]>
Subject: Re: [R] random sampling with levels and with replacement
To: [hidden email]
Cc: "R help" <[hidden email]>
Date: Friday, April 8, 2011, 11:13 AM

Hi,

I am not perfectly sure what you want to do, but here is what I would do
to maintain good/bad ratio in the sample (as Daniel posted, split the
data and sample from the groups):

df <- data.frame(V1 = 1:400, V2 = c(rep("good",360), rep("bad",40)))
isGood <- which(df$V2=="good")
isBad <- which(df$V2=="bad")
sampleGood <- df[sample(isGood, replace=TRUE),]
sampleBad <- df[sample(isBad, replace=TRUE),]
summary(rbind(sampleGood, sampleBad))

Please include a more specific example with test data (for "final" in
this case) next time.

Best regards,

Andreas


taby gathoni schrieb:

> Dear all,
> i have a dataset of about 400 records , with a variable that has  two levels 40 bad and 360 good among other variables,how do i come up  with10 random samples that have the composition of as the main sample  but maintaining the 40 bad 360 good with replacement, i recently discovered that my random samples generated dont maintain the ratio. My code is as  :
>
> mysample <- final[sample(1:nrow(final), 400,replace=TRUE),]
>
> does not give me the ratio of 40 bad and 360 good can anyone give me some pointers please?
>
>
>
> Thanks,
> Taby
>
>
>
>
>     [[alternative HTML version deleted]]
>
>   
> ------------------------------------------------------------------------
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   

--
Andreas Borg
Medizinische Informatik

UNIVERSITÄTSMEDIZIN
der Johannes Gutenberg-Universität
Institut für Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Straße 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: [hidden email]

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.


        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.