Sample rows in data frame by subsets

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Sample rows in data frame by subsets

Chris Stubben
Hi,

I need to resample rows in a data frame by subsets

L3 <- LETTERS[1:3]
d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
    x  y fac
1  1  1   A
2  1  2   A
3  1  3   A
4  1  4   A
5  1  5   C
6  1  6   C
7  1  7   B
8  1  8   A
9  1  9   C
10 1 10   A

I have seen this used to sample rows with replacement

d[sample(nrow(d), replace=T), ]

     x  y fac
7   1  7   B
2   1  2   A
1   1  1   A
3   1  3   A
2.1 1  2   A
10  1 10   A
8   1  8   A
9   1  9   C
1.1 1  1   A
8.1 1  8   A


but I would like to sample based on the original number in fac

summary(d$fac)
A B C
6 1 3


rbind(subset(d, fac=="A")[sample(6, replace=T), ],
       subset(d, fac=="B")[sample(1, replace=T), ],
       subset(d, fac=="C")[sample(3, replace=T), ] )

     x  y fac
2   1  2   A
3   1  3   A
3.1 1  3   A
1   1  1   A
10  1 10   A
1.1 1  1   A
7   1  7   B
5   1  5   C
6   1  6   C
5.1 1  5   C


Is there an easy way to do this in one step or with a short function?  I
have lots of dataframes to resample.

Thanks,

Chris


--
-----------------
Chris Stubben

Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Sample rows in data frame by subsets

Liaw, Andy
Here's one way, if you want to do it in one command:

do.call("rbind", lapply(split(d, d$fac), function(x) x[sample(nrow(x),
nrow(x), replace=TRUE),]))

split() splits the data into a list of data frames, by d$fac.  The lapply()
call then returns the same list, with the components replaced with the
resample of the original components.  Then just rbind them together.

Andy

From: Chris Stubben

>
> Hi,
>
> I need to resample rows in a data frame by subsets
>
> L3 <- LETTERS[1:3]
> d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
>     x  y fac
> 1  1  1   A
> 2  1  2   A
> 3  1  3   A
> 4  1  4   A
> 5  1  5   C
> 6  1  6   C
> 7  1  7   B
> 8  1  8   A
> 9  1  9   C
> 10 1 10   A
>
> I have seen this used to sample rows with replacement
>
> d[sample(nrow(d), replace=T), ]
>
>      x  y fac
> 7   1  7   B
> 2   1  2   A
> 1   1  1   A
> 3   1  3   A
> 2.1 1  2   A
> 10  1 10   A
> 8   1  8   A
> 9   1  9   C
> 1.1 1  1   A
> 8.1 1  8   A
>
>
> but I would like to sample based on the original number in fac
>
> summary(d$fac)
> A B C
> 6 1 3
>
>
> rbind(subset(d, fac=="A")[sample(6, replace=T), ],
>        subset(d, fac=="B")[sample(1, replace=T), ],
>        subset(d, fac=="C")[sample(3, replace=T), ] )
>
>      x  y fac
> 2   1  2   A
> 3   1  3   A
> 3.1 1  3   A
> 1   1  1   A
> 10  1 10   A
> 1.1 1  1   A
> 7   1  7   B
> 5   1  5   C
> 6   1  6   C
> 5.1 1  5   C
>
>
> Is there an easy way to do this in one step or with a short
> function?  I
> have lots of dataframes to resample.
>
> Thanks,
>
> Chris
>
>
> --
> -----------------
> Chris Stubben
>
> Los Alamos National Lab
> BioScience Division
> MS M888
> Los Alamos, NM 87545
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html