Randomly sampling subsets of dataframe variable

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Randomly sampling subsets of dataframe variable

Hosack, Michael
Fellow R users,

I am stumped on what would seem to be something fairly simple.
I have a dataframe that has a variable named 'WEEK' that takes
the numbers 1:26 (26 week time-period) with each number repeated
five times consecutively (once for each weekday, Monday through
Friday). Ex. 111112222233333.....2626262626. I would like to
randomly extract two weekdays per five day week for each of
26 weeks and store this data as a separate dataframe. I have
been unable to get the sample function to work properly.
I have also tried using the runif function to assign random
numbers to each row of my dataframe, sort the dataframe first
by week number then by random number value, and finally select
the first two elements from each week subset (26 weeks total,
giving 52 randomly selected values).  I can't figure out how
to select the first two elements. My goal is to randomly
select two weekdays per week (without replacement) for each of
26 consecutive weeks. Any advice would be greatly appreciated.

Thank you,

Mike

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Randomly sampling subsets of dataframe variable

Phil Spector
Mike -
    Perhaps these suggestions  will be helpful:

somedata = data.frame(week=rep(1:26,rep(5,26)),day=rep(1:5,26))

res = by(somedata,somedata$week,function(x)x[sample(1:nrow(x),2),])
do.call(rbind,res)

or

do.call(rbind,lapply(split(somedata,somedata$week),
               function(x)x[sample(1:nrow(x),2),]))

or

do.call(rbind,tapply(1:nrow(somedata),list(somedata$week),
                      function(x)somedata[sample(x,2),]))


  - Phil Spector
  Statistical Computing Facility
  Department of Statistics
  UC Berkeley
  [hidden email]
On Fri, 12 Mar 2010, Hosack, Michael wrote:

> Fellow R users,
>
> I am stumped on what would seem to be something fairly simple.
> I have a dataframe that has a variable named 'WEEK' that takes
> the numbers 1:26 (26 week time-period) with each number repeated
> five times consecutively (once for each weekday, Monday through
> Friday). Ex. 111112222233333.....2626262626. I would like to
> randomly extract two weekdays per five day week for each of
> 26 weeks and store this data as a separate dataframe. I have
> been unable to get the sample function to work properly.
> I have also tried using the runif function to assign random
> numbers to each row of my dataframe, sort the dataframe first
> by week number then by random number value, and finally select
> the first two elements from each week subset (26 weeks total,
> giving 52 randomly selected values).  I can't figure out how
> to select the first two elements. My goal is to randomly
> select two weekdays per week (without replacement) for each of
> 26 consecutive weeks. Any advice would be greatly appreciated.
>
> Thank you,
>
> Mike
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Randomly sampling subsets of dataframe variable

David Winsemius
In reply to this post by Hosack, Michael

On Mar 12, 2010, at 3:06 PM, Hosack, Michael wrote:

> Fellow R users,
>
> I am stumped on what would seem to be something fairly simple.
> I have a dataframe that has a variable named 'WEEK' that takes
> the numbers 1:26 (26 week time-period) with each number repeated
> five times consecutively (once for each weekday, Monday through
> Friday). Ex. 111112222233333.....2626262626. I would like to
> randomly extract two weekdays per five day week for each of
> 26 weeks and store this data as a separate dataframe. I have
> been unable to get the sample function to work properly.
> I have also tried using the runif function to assign random
> numbers to each row of my dataframe, sort the dataframe first
> by week number then by random number value, and finally select
> the first two elements from each week subset (26 weeks total,
> giving 52 randomly selected values).  I can't figure out how
> to select the first two elements. My goal is to randomly
> select two weekdays per week (without replacement) for each of
> 26 consecutive weeks. Any advice would be greatly appreciated.

 > replicate(26,sample(1:5, 2))
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,
13] [,14] [,15] [,16] [,17]
[1,]    4    1    3    2    3    1    3    5    1     1     2      
4     2     5     1     1     5
[2,]    1    3    4    1    2    3    4    3    3     2     4      
5     1     2     3     5     1
      [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,]     2     4     5     4     5     3     3     4     4
[2,]     4     2     2     1     2     1     1     1     2

 > replicate(26,sample(1:5, 2))[,1]
[1] 1 4

--
David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Randomly sampling subsets of dataframe variable

Stephan Kolassa
In reply to this post by Hosack, Michael
Hi Mike,

take an index vector that selects Monday and Tuesday out of each week,
and then run a restricted random permutation on this vector which only
permutes indices within each week. rperm() is in the sna package.

library(sna)
foo <- rep(c(TRUE,TRUE,FALSE,FALSE,FALSE),26)
your.data[foo[rperm(rep(seq(1,26),each=5))],]

HTH,
Stephan


Hosack, Michael schrieb:

> Fellow R users,
>
> I am stumped on what would seem to be something fairly simple.
> I have a dataframe that has a variable named 'WEEK' that takes
> the numbers 1:26 (26 week time-period) with each number repeated
> five times consecutively (once for each weekday, Monday through
> Friday). Ex. 111112222233333.....2626262626. I would like to
> randomly extract two weekdays per five day week for each of
> 26 weeks and store this data as a separate dataframe. I have
> been unable to get the sample function to work properly.
> I have also tried using the runif function to assign random
> numbers to each row of my dataframe, sort the dataframe first
> by week number then by random number value, and finally select
> the first two elements from each week subset (26 weeks total,
> giving 52 randomly selected values).  I can't figure out how
> to select the first two elements. My goal is to randomly
> select two weekdays per week (without replacement) for each of
> 26 consecutive weeks. Any advice would be greatly appreciated.
>
> Thank you,
>
> Mike
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Randomly sampling subsets of dataframe variable

djmuseR
In reply to this post by Hosack, Michael
Hi:

A ddply solution:

library(plyr)
somedata = data.frame(week=rep(1:26,rep(5,26)),day=rep(1:5,26))
# sample two rows out of five per week
daysamp <- function(x) x[sample(1:5, 2), ]
# Ram it through ddply:
ddply(somedata, .(week), daysamp)

First part of output:
   week day
1     1   4
2     1   3
3     2   2
4     2   1
5     3   4
6     3   1
7     4   1
8     4   5

(52 rows in all, as expected)

HTH,
Dennis

On Fri, Mar 12, 2010 at 12:06 PM, Hosack, Michael <[hidden email]>wrote:

> Fellow R users,
>
> I am stumped on what would seem to be something fairly simple.
> I have a dataframe that has a variable named 'WEEK' that takes
> the numbers 1:26 (26 week time-period) with each number repeated
> five times consecutively (once for each weekday, Monday through
> Friday). Ex. 111112222233333.....2626262626. I would like to
> randomly extract two weekdays per five day week for each of
> 26 weeks and store this data as a separate dataframe. I have
> been unable to get the sample function to work properly.
> I have also tried using the runif function to assign random
> numbers to each row of my dataframe, sort the dataframe first
> by week number then by random number value, and finally select
> the first two elements from each week subset (26 weeks total,
> giving 52 randomly selected values).  I can't figure out how
> to select the first two elements. My goal is to randomly
> select two weekdays per week (without replacement) for each of
> 26 consecutive weeks. Any advice would be greatly appreciated.
>
> Thank you,
>
> Mike
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Randomly sampling subsets of dataframe variable

Chuck Cleland
In reply to this post by Hosack, Michael
On 3/12/2010 3:06 PM, Hosack, Michael wrote:

> Fellow R users,
>
> I am stumped on what would seem to be something fairly simple.
> I have a dataframe that has a variable named 'WEEK' that takes
> the numbers 1:26 (26 week time-period) with each number repeated
> five times consecutively (once for each weekday, Monday through
> Friday). Ex. 111112222233333.....2626262626. I would like to
> randomly extract two weekdays per five day week for each of
> 26 weeks and store this data as a separate dataframe. I have
> been unable to get the sample function to work properly.
> I have also tried using the runif function to assign random
> numbers to each row of my dataframe, sort the dataframe first
> by week number then by random number value, and finally select
> the first two elements from each week subset (26 weeks total,
> giving 52 randomly selected values).  I can't figure out how
> to select the first two elements. My goal is to randomly
> select two weekdays per week (without replacement) for each of
> 26 consecutive weeks. Any advice would be greatly appreciated.

DF <- data.frame(WEEK = rep(1:26, each=5), DAY = rep(1:5, 26), X =
runif(5*26))

DF2 <- data.frame(DAY = c(replicate(26, sample(5, 2, replace=FALSE))),
WEEK = rep(1:26, each=2))

new.DF <- merge(DF, DF2, all=FALSE)

> Thank you,
>
> Mike
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Randomly sampling subsets of dataframe variable

Ali Tofigh
In reply to this post by Hosack, Michael
I would just get row indices:

row.indices <- as.vector(sapply(0:25 * 5 + 1, function(x)
{sort(sample(x:(x+4), 2))}))
new.data.fram <- your.data.frame[row.indices, ]

Cheers,
/Ali

On Fri, Mar 12, 2010 at 15:06, Hosack, Michael <[hidden email]> wrote:

> Fellow R users,
>
> I am stumped on what would seem to be something fairly simple.
> I have a dataframe that has a variable named 'WEEK' that takes
> the numbers 1:26 (26 week time-period) with each number repeated
> five times consecutively (once for each weekday, Monday through
> Friday). Ex. 111112222233333.....2626262626. I would like to
> randomly extract two weekdays per five day week for each of
> 26 weeks and store this data as a separate dataframe. I have
> been unable to get the sample function to work properly.
> I have also tried using the runif function to assign random
> numbers to each row of my dataframe, sort the dataframe first
> by week number then by random number value, and finally select
> the first two elements from each week subset (26 weeks total,
> giving 52 randomly selected values).  I can't figure out how
> to select the first two elements. My goal is to randomly
> select two weekdays per week (without replacement) for each of
> 26 consecutive weeks. Any advice would be greatly appreciated.
>
> Thank you,
>
> Mike
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.