Sample of a subsample

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Sample of a subsample

David Studer
Hello everybody!

I have the following problem: I'd like to select a sample from a subsample
in a dataset. Actually, I don't want to select it, but to create a new
variable sampleNo that indicates to which sample (one or two) a case
belongs to.

Lets suppose I have a dataset containing 40 cases:

data <- data.frame(var1=seq(1:40), var2=seq(40,1))

The first sample (n=10) I drew like this:

data$sampleNo <- 0
idx <- sample(seq(1,nrow(data)), size=10, replace=F)
data[idx,]$sampleNo <- 1

Now, (and here my problems start) I'd like to draw a second sample (n=10).
But this sample should be drawn from the cases that don't belong to the
first sample only. *Additionally, "var1" should be an even number.*

So sampleNo should be 0 for cases that were not drawn at all, 1 for cases
that belong to the first sample and 2 for cases belonging to the second
sample (= sampleNo equals 0 and var1 is even).

I was trying to solve it like this:

idx2<-data$var1%%2 & data$sampleNo==0
sample(data[idx2,], size=10, replace=F)

But how can I set sampleNo to 2?


Thank you very much for your help!

David

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Sample of a subsample

Bert Gunter-2
For personal aesthetic reasons, I changed the name "data" to "dat".

Your code, with a slight modification:

set.seed (1357)  ## for reproducibility
dat <- data.frame(var1=seq(1:40), var2=seq(40,1))
dat$sampleNo <- 0
idx <- sample(seq(1,nrow(dat)), size=10, replace=F)
dat[idx,"sampleNo"] <-1

## yielding
> dat

   var1 var2 sampleNo
1     1   40        0
2     2   39        1
3     3   38        0
4     4   37        0
5     5   36        0
6     6   35        1
7     7   34        0
8     8   33        0
9     9   32        0
10   10   31        0
11   11   30        0
12   12   29        0
13   13   28        0
14   14   27        0
15   15   26        1
16   16   25        1
17   17   24        0
18   18   23        0
19   19   22        0
20   20   21        1
21   21   20        0
22   22   19        1
23   23   18        0
24   24   17        1
25   25   16        0
26   26   15        1
27   27   14        0
28   28   13        0
29   29   12        0
30   30   11        0
31   31   10        0
32   32    9        0
33   33    8        0
34   34    7        0
35   35    6        1
36   36    5        0
37   37    4        1
38   38    3        0
39   39    2        0
40   40    1        0

## This is basically a transcription of your specification into indexing
logic

dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) &
sampleNo==0],10,rep=FALSE)] <- 2)

##yielding
> dat

   var1 var2 sampleNo
1     1   40        0
2     2   39        1
3     3   38        0
4     4   37        2
5     5   36        0
6     6   35        1
7     7   34        0
8     8   33        2
9     9   32        0
10   10   31        2
11   11   30        0
12   12   29        0
13   13   28        0
14   14   27        2
15   15   26        1
16   16   25        1
17   17   24        0
18   18   23        2
19   19   22        0
20   20   21        1
21   21   20        0
22   22   19        1
23   23   18        0
24   24   17        1
25   25   16        0
26   26   15        1
27   27   14        0
28   28   13        2
29   29   12        0
30   30   11        2
31   31   10        0
32   32    9        2
33   33    8        0
34   34    7        2
35   35    6        1
36   36    5        2
37   37    4        1
38   38    3        0
39   39    2        0
40   40    1        0





dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) &
sampleNo==0],10,rep=FALSE)] <- 2)




Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Sep 25, 2017 at 10:27 AM, David Studer <[hidden email]> wrote:

> Hello everybody!
>
> I have the following problem: I'd like to select a sample from a subsample
> in a dataset. Actually, I don't want to select it, but to create a new
> variable sampleNo that indicates to which sample (one or two) a case
> belongs to.
>
> Lets suppose I have a dataset containing 40 cases:
>
> data <- data.frame(var1=seq(1:40), var2=seq(40,1))
>
> The first sample (n=10) I drew like this:
>
> data$sampleNo <- 0
> idx <- sample(seq(1,nrow(data)), size=10, replace=F)
> data[idx,]$sampleNo <- 1
>
> Now, (and here my problems start) I'd like to draw a second sample (n=10).
> But this sample should be drawn from the cases that don't belong to the
> first sample only. *Additionally, "var1" should be an even number.*
>
> So sampleNo should be 0 for cases that were not drawn at all, 1 for cases
> that belong to the first sample and 2 for cases belonging to the second
> sample (= sampleNo equals 0 and var1 is even).
>
> I was trying to solve it like this:
>
> idx2<-data$var1%%2 & data$sampleNo==0
> sample(data[idx2,], size=10, replace=F)
>
> But how can I set sampleNo to 2?
>
>
> Thank you very much for your help!
>
> David
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Sample of a subsample

Eric Berger
Hi David,
I was about to post a reply when Bert responded. His answer is good
and his comment to use the name 'dat' rather than 'data' is instructive.
I am providing my suggestion as well because I think it may address
what was causing you some confusion (mainly to use "which", but also
the missing !)

idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ), size=10,
replace=F)
data[idx2,]$sampleNo <- 2

Eric



On Mon, Sep 25, 2017 at 9:03 PM, Bert Gunter <[hidden email]> wrote:

> For personal aesthetic reasons, I changed the name "data" to "dat".
>
> Your code, with a slight modification:
>
> set.seed (1357)  ## for reproducibility
> dat <- data.frame(var1=seq(1:40), var2=seq(40,1))
> dat$sampleNo <- 0
> idx <- sample(seq(1,nrow(dat)), size=10, replace=F)
> dat[idx,"sampleNo"] <-1
>
> ## yielding
> > dat
>
>    var1 var2 sampleNo
> 1     1   40        0
> 2     2   39        1
> 3     3   38        0
> 4     4   37        0
> 5     5   36        0
> 6     6   35        1
> 7     7   34        0
> 8     8   33        0
> 9     9   32        0
> 10   10   31        0
> 11   11   30        0
> 12   12   29        0
> 13   13   28        0
> 14   14   27        0
> 15   15   26        1
> 16   16   25        1
> 17   17   24        0
> 18   18   23        0
> 19   19   22        0
> 20   20   21        1
> 21   21   20        0
> 22   22   19        1
> 23   23   18        0
> 24   24   17        1
> 25   25   16        0
> 26   26   15        1
> 27   27   14        0
> 28   28   13        0
> 29   29   12        0
> 30   30   11        0
> 31   31   10        0
> 32   32    9        0
> 33   33    8        0
> 34   34    7        0
> 35   35    6        1
> 36   36    5        0
> 37   37    4        1
> 38   38    3        0
> 39   39    2        0
> 40   40    1        0
>
> ## This is basically a transcription of your specification into indexing
> logic
>
> dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) &
> sampleNo==0],10,rep=FALSE)] <- 2)
>
> ##yielding
> > dat
>
>    var1 var2 sampleNo
> 1     1   40        0
> 2     2   39        1
> 3     3   38        0
> 4     4   37        2
> 5     5   36        0
> 6     6   35        1
> 7     7   34        0
> 8     8   33        2
> 9     9   32        0
> 10   10   31        2
> 11   11   30        0
> 12   12   29        0
> 13   13   28        0
> 14   14   27        2
> 15   15   26        1
> 16   16   25        1
> 17   17   24        0
> 18   18   23        2
> 19   19   22        0
> 20   20   21        1
> 21   21   20        0
> 22   22   19        1
> 23   23   18        0
> 24   24   17        1
> 25   25   16        0
> 26   26   15        1
> 27   27   14        0
> 28   28   13        2
> 29   29   12        0
> 30   30   11        2
> 31   31   10        0
> 32   32    9        2
> 33   33    8        0
> 34   34    7        2
> 35   35    6        1
> 36   36    5        2
> 37   37    4        1
> 38   38    3        0
> 39   39    2        0
> 40   40    1        0
>
>
>
>
>
> dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) &
> sampleNo==0],10,rep=FALSE)] <- 2)
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Mon, Sep 25, 2017 at 10:27 AM, David Studer <[hidden email]> wrote:
>
> > Hello everybody!
> >
> > I have the following problem: I'd like to select a sample from a
> subsample
> > in a dataset. Actually, I don't want to select it, but to create a new
> > variable sampleNo that indicates to which sample (one or two) a case
> > belongs to.
> >
> > Lets suppose I have a dataset containing 40 cases:
> >
> > data <- data.frame(var1=seq(1:40), var2=seq(40,1))
> >
> > The first sample (n=10) I drew like this:
> >
> > data$sampleNo <- 0
> > idx <- sample(seq(1,nrow(data)), size=10, replace=F)
> > data[idx,]$sampleNo <- 1
> >
> > Now, (and here my problems start) I'd like to draw a second sample
> (n=10).
> > But this sample should be drawn from the cases that don't belong to the
> > first sample only. *Additionally, "var1" should be an even number.*
> >
> > So sampleNo should be 0 for cases that were not drawn at all, 1 for cases
> > that belong to the first sample and 2 for cases belonging to the second
> > sample (= sampleNo equals 0 and var1 is even).
> >
> > I was trying to solve it like this:
> >
> > idx2<-data$var1%%2 & data$sampleNo==0
> > sample(data[idx2,], size=10, replace=F)
> >
> > But how can I set sampleNo to 2?
> >
> >
> > Thank you very much for your help!
> >
> > David
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Sample of a subsample

Bert Gunter-2
Yes.

Beating a pretty weary horse, a slightly cleaner version of my prior
offering using with(), instead of within() is:

with(dat,
dat[sampleNo[sample(var1[!var1%%2 & !sampleNo], 10, rep=FALSE)],
"sampleNo"] <- 2)

with() and within() are convenient ways to avoid having to repeatedly name
the columns via $  . Note also the use of logical subscripting of the data
frame in which numeric 0 is coerced to FALSE and any nonzero value to TRUE
(which I should have done previously).

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Sep 25, 2017 at 11:43 AM, Eric Berger <[hidden email]> wrote:

> Hi David,
> I was about to post a reply when Bert responded. His answer is good
> and his comment to use the name 'dat' rather than 'data' is instructive.
> I am providing my suggestion as well because I think it may address
> what was causing you some confusion (mainly to use "which", but also
> the missing !)
>
> idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ), size=10,
> replace=F)
> data[idx2,]$sampleNo <- 2
>
> Eric
>
>
>
> On Mon, Sep 25, 2017 at 9:03 PM, Bert Gunter <[hidden email]>
> wrote:
>
>> For personal aesthetic reasons, I changed the name "data" to "dat".
>>
>> Your code, with a slight modification:
>>
>> set.seed (1357)  ## for reproducibility
>> dat <- data.frame(var1=seq(1:40), var2=seq(40,1))
>> dat$sampleNo <- 0
>> idx <- sample(seq(1,nrow(dat)), size=10, replace=F)
>> dat[idx,"sampleNo"] <-1
>>
>> ## yielding
>> > dat
>>
>>    var1 var2 sampleNo
>> 1     1   40        0
>> 2     2   39        1
>> 3     3   38        0
>> 4     4   37        0
>> 5     5   36        0
>> 6     6   35        1
>> 7     7   34        0
>> 8     8   33        0
>> 9     9   32        0
>> 10   10   31        0
>> 11   11   30        0
>> 12   12   29        0
>> 13   13   28        0
>> 14   14   27        0
>> 15   15   26        1
>> 16   16   25        1
>> 17   17   24        0
>> 18   18   23        0
>> 19   19   22        0
>> 20   20   21        1
>> 21   21   20        0
>> 22   22   19        1
>> 23   23   18        0
>> 24   24   17        1
>> 25   25   16        0
>> 26   26   15        1
>> 27   27   14        0
>> 28   28   13        0
>> 29   29   12        0
>> 30   30   11        0
>> 31   31   10        0
>> 32   32    9        0
>> 33   33    8        0
>> 34   34    7        0
>> 35   35    6        1
>> 36   36    5        0
>> 37   37    4        1
>> 38   38    3        0
>> 39   39    2        0
>> 40   40    1        0
>>
>> ## This is basically a transcription of your specification into indexing
>> logic
>>
>> dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) &
>> sampleNo==0],10,rep=FALSE)] <- 2)
>>
>> ##yielding
>> > dat
>>
>>    var1 var2 sampleNo
>> 1     1   40        0
>> 2     2   39        1
>> 3     3   38        0
>> 4     4   37        2
>> 5     5   36        0
>> 6     6   35        1
>> 7     7   34        0
>> 8     8   33        2
>> 9     9   32        0
>> 10   10   31        2
>> 11   11   30        0
>> 12   12   29        0
>> 13   13   28        0
>> 14   14   27        2
>> 15   15   26        1
>> 16   16   25        1
>> 17   17   24        0
>> 18   18   23        2
>> 19   19   22        0
>> 20   20   21        1
>> 21   21   20        0
>> 22   22   19        1
>> 23   23   18        0
>> 24   24   17        1
>> 25   25   16        0
>> 26   26   15        1
>> 27   27   14        0
>> 28   28   13        2
>> 29   29   12        0
>> 30   30   11        2
>> 31   31   10        0
>> 32   32    9        2
>> 33   33    8        0
>> 34   34    7        2
>> 35   35    6        1
>> 36   36    5        2
>> 37   37    4        1
>> 38   38    3        0
>> 39   39    2        0
>> 40   40    1        0
>>
>>
>>
>>
>>
>> dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) &
>> sampleNo==0],10,rep=FALSE)] <- 2)
>>
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Mon, Sep 25, 2017 at 10:27 AM, David Studer <[hidden email]>
>> wrote:
>>
>> > Hello everybody!
>> >
>> > I have the following problem: I'd like to select a sample from a
>> subsample
>> > in a dataset. Actually, I don't want to select it, but to create a new
>> > variable sampleNo that indicates to which sample (one or two) a case
>> > belongs to.
>> >
>> > Lets suppose I have a dataset containing 40 cases:
>> >
>> > data <- data.frame(var1=seq(1:40), var2=seq(40,1))
>> >
>> > The first sample (n=10) I drew like this:
>> >
>> > data$sampleNo <- 0
>> > idx <- sample(seq(1,nrow(data)), size=10, replace=F)
>> > data[idx,]$sampleNo <- 1
>> >
>> > Now, (and here my problems start) I'd like to draw a second sample
>> (n=10).
>> > But this sample should be drawn from the cases that don't belong to the
>> > first sample only. *Additionally, "var1" should be an even number.*
>> >
>> > So sampleNo should be 0 for cases that were not drawn at all, 1 for
>> cases
>> > that belong to the first sample and 2 for cases belonging to the second
>> > sample (= sampleNo equals 0 and var1 is even).
>> >
>> > I was trying to solve it like this:
>> >
>> > idx2<-data$var1%%2 & data$sampleNo==0
>> > sample(data[idx2,], size=10, replace=F)
>> >
>> > But how can I set sampleNo to 2?
>> >
>> >
>> > Thank you very much for your help!
>> >
>> > David
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.