how to randomly select the samples with different probabilities for different classes?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

how to randomly select the samples with different probabilities for different classes?

Marna Wagley
Hi R user,
I have samples with covariates for different classes, I wanted to choose
the samples of different groups with different probabilities. For example,
I have a 22 samples size with 3 classes,
groupA has 8 samples
groupB has 8 samples
groupC has 6 samples

I want to select a total 14 samples from 22 samples, in which  40% of the
14 samples should be in groups A and B, 60% of the 14 samples should be in
the group C.

Would you mind to help me on how I can select the samples with that
conditions? I have attached a sample data

dat<-structure(list(sampleID = c(17L, 21L, 36L, 45L, 67L, 82L, 90L,
31L, 70L, 45L, 24L, 80L, 82L, 45L, 85L, 14L, 81L, 96L, 61L, 12L,
65L, 88L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor")), .Names = c("sampleID", "group"
), class = "data.frame", row.names = c(NA, -22L))

thanks,
  MW

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to randomly select the samples with different probabilities for different classes?

Rui Barradas
Hello,

If 60% of the 14 samples come from group C, then 8.4 samples should come
from a group with 6 elements. Do you want sampling with replacement? If
so maybe the following will do.


perc <- c(0.4, 0.6)
tmp <- split(seq_len(nrow(dat)), dat$group == "C")
idx <- sapply(seq_along(tmp), function(i) sample(length(tmp[[i]]),
round(perc[i]*14), replace = TRUE))
idx[[2]] <- idx[[2]] + 16
idx <- unlist(idx)
dat[idx, ]

Hope this helps,

Rui Barradas

Em 07-12-2016 11:58, Marna Wagley escreveu:

> Hi R user,
> I have samples with covariates for different classes, I wanted to choose
> the samples of different groups with different probabilities. For example,
> I have a 22 samples size with 3 classes,
> groupA has 8 samples
> groupB has 8 samples
> groupC has 6 samples
>
> I want to select a total 14 samples from 22 samples, in which  40% of the
> 14 samples should be in groups A and B, 60% of the 14 samples should be in
> the group C.
>
> Would you mind to help me on how I can select the samples with that
> conditions? I have attached a sample data
>
> dat<-structure(list(sampleID = c(17L, 21L, 36L, 45L, 67L, 82L, 90L,
> 31L, 70L, 45L, 24L, 80L, 82L, 45L, 85L, 14L, 81L, 96L, 61L, 12L,
> 65L, 88L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A",
> "B", "C"), class = "factor")), .Names = c("sampleID", "group"
> ), class = "data.frame", row.names = c(NA, -22L))
>
> thanks,
>    MW
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to randomly select the samples with different probabilities for different classes?

Jim Lemon-4
In reply to this post by Marna Wagley
Hi Marna,
If we assume a sample size of 1, something like this:

dat[sample(which(dat$group!="C"),ceiling(14*0.4),TRUE),]
dat[sample(which(dat$group=="C"),floor(14*0.6),TRUE),]

Then just step through the two subsets to access your samples.

One problem is that you will not get exactly 40 or 60 %, which is why
I had to put the "ceiling " and "floor" functions to work. Also, you
will have to sample with replacement as you will exhaust the "C"
group.

Jim


On Wed, Dec 7, 2016 at 10:58 PM, Marna Wagley <[hidden email]> wrote:

> Hi R user,
> I have samples with covariates for different classes, I wanted to choose
> the samples of different groups with different probabilities. For example,
> I have a 22 samples size with 3 classes,
> groupA has 8 samples
> groupB has 8 samples
> groupC has 6 samples
>
> I want to select a total 14 samples from 22 samples, in which  40% of the
> 14 samples should be in groups A and B, 60% of the 14 samples should be in
> the group C.
>
> Would you mind to help me on how I can select the samples with that
> conditions? I have attached a sample data
>
> dat<-structure(list(sampleID = c(17L, 21L, 36L, 45L, 67L, 82L, 90L,
> 31L, 70L, 45L, 24L, 80L, 82L, 45L, 85L, 14L, 81L, 96L, 61L, 12L,
> 65L, 88L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A",
> "B", "C"), class = "factor")), .Names = c("sampleID", "group"
> ), class = "data.frame", row.names = c(NA, -22L))
>
> thanks,
>   MW
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.