Randomly split a sample in two equal subsamples

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Randomly split a sample in two equal subsamples

yoan
Dear all,

I would like to randomly split a sample in two equally large
subsamples. The sample data is stored as a matrix with each row
representing an individual and each column representing some variable
(e.g., name, age, sex, etc.); the first row contains the names of the
variables; the first column contains the individual number (1:n, for n
individuals); the number of individuals is even (so, the overall
number of rows is odd).

I found similar threads (like "random subset"), but I don't know how
to apply the information from them in my case. Could somebody help me
a little bit? Thanks in advance!

Yoan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Randomly split a sample in two equal subsamples

Wu Gong
Hi Yoan,

Please try ?sample.

Suppose you have 1:n ids of total observations where n is even, you want to randomly split it into two subsamples, the following code should work.

n <- 20
one.sample <- sort(sample(1:n, n/2))
another.sample <- (1:n)[-one.sample]


Good luck.

Wu
Reply | Threaded
Open this post in threaded view
|

Re: Randomly split a sample in two equal subsamples

yoan
Thanks, but I just don't know how to translate that to a dataset with
rows and columns.
Initially, I was thinking about something like that:

# Create some data:
a <- c(10,20,15,43,76,41,25,46)
b <- factor(c("m", "w", "m", "w", "m", "w", "m", "w"))
c <- c(2,5,8,3,6,1,5,6)
number <- c(1:8)
myframe <- data.frame(a,b,c, number)

# Randomly sample a subset of "numbers":
v1 <- sample(number, 4, replace=FALSE)
v2 <- number[-v1]

# Use the "subset" command like this:
firsthalf <- subset(myframe, number=v1)

Of course, the last line doesn't work. Is this generally a wrong
approach, or is just my writing wrong?
Reply | Threaded
Open this post in threaded view
|

Re: Randomly split a sample in two equal subsamples

Wu Gong
firsthalf <- myframe[v1,]

or

firsthalf <- subset(myframe, number %in% v1)
Reply | Threaded
Open this post in threaded view
|

Re: Randomly split a sample in two equal subsamples

yoan
Thanks, it works!