Select

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Select

Val-17
Hi all,

I have a data frame  with tow variables  group and its size.
mydat<- read.table( text='group  count
G1 25
G2 15
G3 12
G4 31
G5 10' , header = TRUE, as.is = TRUE )

I want to select   group ID randomly (without replacement)  until  the
sum of count reaches 40.
So, in  the first case, the data frame could be
   G4 31
   65 10

In other case, it could be
  G5 10
  G2 15
  G3 12

How do I put sum of count variable   is  a minimum of 40 restriction?

Than k you in advance






I want to select group  ids randomly until I reach the

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Select

David Carlson
First expand your data frame into a vector where G1 is repeated 25 times, G2 is repeated 15 times, etc. Then draw random samples of 40 from that vector:

> grp <- rep(mydat$group, mydat$count)
> grp.sam <- sample(grp, 40)
> table(grp.sam)
grp.sam
G1 G2 G3 G4 G5
10  9  5 13  3

----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352


-----Original Message-----
From: R-help <[hidden email]> On Behalf Of Val
Sent: Monday, February 11, 2019 4:36 PM
To: [hidden email] ([hidden email]) <[hidden email]>
Subject: [R] Select

Hi all,

I have a data frame  with tow variables  group and its size.
mydat<- read.table( text='group  count
G1 25
G2 15
G3 12
G4 31
G5 10' , header = TRUE, as.is = TRUE )

I want to select   group ID randomly (without replacement)  until  the
sum of count reaches 40.
So, in  the first case, the data frame could be
   G4 31
   65 10

In other case, it could be
  G5 10
  G2 15
  G3 12

How do I put sum of count variable   is  a minimum of 40 restriction?

Than k you in advance






I want to select group  ids randomly until I reach the

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Select

Val-17
Thank you David.

However, this will not work for me. If the group ID selected then all
of its observation should be included.

On Mon, Feb 11, 2019 at 4:51 PM David L Carlson <[hidden email]> wrote:

>
> First expand your data frame into a vector where G1 is repeated 25 times, G2 is repeated 15 times, etc. Then draw random samples of 40 from that vector:
>
> > grp <- rep(mydat$group, mydat$count)
> > grp.sam <- sample(grp, 40)
> > table(grp.sam)
> grp.sam
> G1 G2 G3 G4 G5
> 10  9  5 13  3
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Val
> Sent: Monday, February 11, 2019 4:36 PM
> To: [hidden email] ([hidden email]) <[hidden email]>
> Subject: [R] Select
>
> Hi all,
>
> I have a data frame  with tow variables  group and its size.
> mydat<- read.table( text='group  count
> G1 25
> G2 15
> G3 12
> G4 31
> G5 10' , header = TRUE, as.is = TRUE )
>
> I want to select   group ID randomly (without replacement)  until  the
> sum of count reaches 40.
> So, in  the first case, the data frame could be
>    G4 31
>    65 10
>
> In other case, it could be
>   G5 10
>   G2 15
>   G3 12
>
> How do I put sum of count variable   is  a minimum of 40 restriction?
>
> Than k you in advance
>
>
>
>
>
>
> I want to select group  ids randomly until I reach the
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Select

Jeff Newmiller
This constraint was not clear in your original sample data set. Can you expand the data set to clarify how this requirement REALLY works?

On February 11, 2019 3:00:15 PM PST, Val <[hidden email]> wrote:

>Thank you David.
>
>However, this will not work for me. If the group ID selected then all
>of its observation should be included.
>
>On Mon, Feb 11, 2019 at 4:51 PM David L Carlson <[hidden email]>
>wrote:
>>
>> First expand your data frame into a vector where G1 is repeated 25
>times, G2 is repeated 15 times, etc. Then draw random samples of 40
>from that vector:
>>
>> > grp <- rep(mydat$group, mydat$count)
>> > grp.sam <- sample(grp, 40)
>> > table(grp.sam)
>> grp.sam
>> G1 G2 G3 G4 G5
>> 10  9  5 13  3
>>
>> ----------------------------------------
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>> -----Original Message-----
>> From: R-help <[hidden email]> On Behalf Of Val
>> Sent: Monday, February 11, 2019 4:36 PM
>> To: [hidden email] ([hidden email])
><[hidden email]>
>> Subject: [R] Select
>>
>> Hi all,
>>
>> I have a data frame  with tow variables  group and its size.
>> mydat<- read.table( text='group  count
>> G1 25
>> G2 15
>> G3 12
>> G4 31
>> G5 10' , header = TRUE, as.is = TRUE )
>>
>> I want to select   group ID randomly (without replacement)  until
>the
>> sum of count reaches 40.
>> So, in  the first case, the data frame could be
>>    G4 31
>>    65 10
>>
>> In other case, it could be
>>   G5 10
>>   G2 15
>>   G3 12
>>
>> How do I put sum of count variable   is  a minimum of 40 restriction?
>>
>> Than k you in advance
>>
>>
>>
>>
>>
>>
>> I want to select group  ids randomly until I reach the
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Select

Göran Broström-3
In reply to this post by Val-17


On 2019-02-11 23:35, Val wrote:

> Hi all,
>
> I have a data frame  with tow variables  group and its size.
> mydat<- read.table( text='group  count
> G1 25
> G2 15
> G3 12
> G4 31
> G5 10' , header = TRUE, as.is = TRUE )
>

How about

x <- sample(1:5)

total <- mydat$count[x[1]]
i <- 1
while (total < 40){
     i <- i + 1
     total <- total + mydat$count[x[i]]
}

print(mydat$group[x[1:i]])

Göran


> I want to select   group ID randomly (without replacement)  until  the
> sum of count reaches 40.
> So, in  the first case, the data frame could be
>     G4 31
>     65 10
>
> In other case, it could be
>    G5 10
>    G2 15
>    G3 12
>
> How do I put sum of count variable   is  a minimum of 40 restriction?
>
> Than k you in advance
>
>
>
>
>
>
> I want to select group  ids randomly until I reach the
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Select

Val-17
In reply to this post by Jeff Newmiller
Sorry Jeff and David  for not being clear!

The total sample size should be at least 40, but the selection should
be based on group ID.  A different combination of Group ID could give
 at least  40.
If I select  group G1   with 25  count and  G2  and with 15  counts
then   I can get  a minimum of 40  counts.   So G1 and G2 are
selected.
G1  25
G2  15

In another scenario, if G2, G3 and G4  are  selected  then the total
count will be 58 which is  greater than 40. So G2 , G3 and G4  could
be selected.
 G2 15
 G3 12
 G4 31

So the restriction is to  find group IDs  that give a minim of  40.
Once, I reached a minim of 40 then stop selecting group  and output
the data..

I am hope this helps




On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller <[hidden email]> wrote:

>
> This constraint was not clear in your original sample data set. Can you expand the data set to clarify how this requirement REALLY works?
>
> On February 11, 2019 3:00:15 PM PST, Val <[hidden email]> wrote:
> >Thank you David.
> >
> >However, this will not work for me. If the group ID selected then all
> >of its observation should be included.
> >
> >On Mon, Feb 11, 2019 at 4:51 PM David L Carlson <[hidden email]>
> >wrote:
> >>
> >> First expand your data frame into a vector where G1 is repeated 25
> >times, G2 is repeated 15 times, etc. Then draw random samples of 40
> >from that vector:
> >>
> >> > grp <- rep(mydat$group, mydat$count)
> >> > grp.sam <- sample(grp, 40)
> >> > table(grp.sam)
> >> grp.sam
> >> G1 G2 G3 G4 G5
> >> 10  9  5 13  3
> >>
> >> ----------------------------------------
> >> David L Carlson
> >> Department of Anthropology
> >> Texas A&M University
> >> College Station, TX 77843-4352
> >>
> >>
> >> -----Original Message-----
> >> From: R-help <[hidden email]> On Behalf Of Val
> >> Sent: Monday, February 11, 2019 4:36 PM
> >> To: [hidden email] ([hidden email])
> ><[hidden email]>
> >> Subject: [R] Select
> >>
> >> Hi all,
> >>
> >> I have a data frame  with tow variables  group and its size.
> >> mydat<- read.table( text='group  count
> >> G1 25
> >> G2 15
> >> G3 12
> >> G4 31
> >> G5 10' , header = TRUE, as.is = TRUE )
> >>
> >> I want to select   group ID randomly (without replacement)  until
> >the
> >> sum of count reaches 40.
> >> So, in  the first case, the data frame could be
> >>    G4 31
> >>    65 10
> >>
> >> In other case, it could be
> >>   G5 10
> >>   G2 15
> >>   G3 12
> >>
> >> How do I put sum of count variable   is  a minimum of 40 restriction?
> >>
> >> Than k you in advance
> >>
> >>
> >>
> >>
> >>
> >>
> >> I want to select group  ids randomly until I reach the
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Select

Jeff Newmiller
N <- 8 # however many times you want to do this
ans <- lapply( seq.int( N )
              , function( n ) {
                  idx <- sample( nrow( mydat ) )
                  mydat[ idx[ seq.int( which( 40 < cumsum( mydat[ idx, "count" ] ) )[ 1 ] ) ], ]
                }
              )


On Mon, 11 Feb 2019, Val wrote:

> Sorry Jeff and David  for not being clear!
>
> The total sample size should be at least 40, but the selection should
> be based on group ID.  A different combination of Group ID could give
> at least  40.
> If I select  group G1   with 25  count and  G2  and with 15  counts
> then   I can get  a minimum of 40  counts.   So G1 and G2 are
> selected.
> G1  25
> G2  15
>
> In another scenario, if G2, G3 and G4  are  selected  then the total
> count will be 58 which is  greater than 40. So G2 , G3 and G4  could
> be selected.
> G2 15
> G3 12
> G4 31
>
> So the restriction is to  find group IDs  that give a minim of  40.
> Once, I reached a minim of 40 then stop selecting group  and output
> the data..
>
> I am hope this helps
>
>
>
>
> On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller <[hidden email]> wrote:
>>
>> This constraint was not clear in your original sample data set. Can you expand the data set to clarify how this requirement REALLY works?
>>
>> On February 11, 2019 3:00:15 PM PST, Val <[hidden email]> wrote:
>>> Thank you David.
>>>
>>> However, this will not work for me. If the group ID selected then all
>>> of its observation should be included.
>>>
>>> On Mon, Feb 11, 2019 at 4:51 PM David L Carlson <[hidden email]>
>>> wrote:
>>>>
>>>> First expand your data frame into a vector where G1 is repeated 25
>>> times, G2 is repeated 15 times, etc. Then draw random samples of 40
>>> from that vector:
>>>>
>>>>> grp <- rep(mydat$group, mydat$count)
>>>>> grp.sam <- sample(grp, 40)
>>>>> table(grp.sam)
>>>> grp.sam
>>>> G1 G2 G3 G4 G5
>>>> 10  9  5 13  3
>>>>
>>>> ----------------------------------------
>>>> David L Carlson
>>>> Department of Anthropology
>>>> Texas A&M University
>>>> College Station, TX 77843-4352
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: R-help <[hidden email]> On Behalf Of Val
>>>> Sent: Monday, February 11, 2019 4:36 PM
>>>> To: [hidden email] ([hidden email])
>>> <[hidden email]>
>>>> Subject: [R] Select
>>>>
>>>> Hi all,
>>>>
>>>> I have a data frame  with tow variables  group and its size.
>>>> mydat<- read.table( text='group  count
>>>> G1 25
>>>> G2 15
>>>> G3 12
>>>> G4 31
>>>> G5 10' , header = TRUE, as.is = TRUE )
>>>>
>>>> I want to select   group ID randomly (without replacement)  until
>>> the
>>>> sum of count reaches 40.
>>>> So, in  the first case, the data frame could be
>>>>    G4 31
>>>>    65 10
>>>>
>>>> In other case, it could be
>>>>   G5 10
>>>>   G2 15
>>>>   G3 12
>>>>
>>>> How do I put sum of count variable   is  a minimum of 40 restriction?
>>>>
>>>> Than k you in advance
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I want to select group  ids randomly until I reach the
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Select

Val-17
Thank you very much Jeff, Goran and David  for your help.


On Mon, Feb 11, 2019 at 6:22 PM Jeff Newmiller <[hidden email]> wrote:

>
> N <- 8 # however many times you want to do this
> ans <- lapply( seq.int( N )
>               , function( n ) {
>                   idx <- sample( nrow( mydat ) )
>                   mydat[ idx[ seq.int( which( 40 < cumsum( mydat[ idx, "count" ] ) )[ 1 ] ) ], ]
>                 }
>               )
>
>
> On Mon, 11 Feb 2019, Val wrote:
>
> > Sorry Jeff and David  for not being clear!
> >
> > The total sample size should be at least 40, but the selection should
> > be based on group ID.  A different combination of Group ID could give
> > at least  40.
> > If I select  group G1   with 25  count and  G2  and with 15  counts
> > then   I can get  a minimum of 40  counts.   So G1 and G2 are
> > selected.
> > G1  25
> > G2  15
> >
> > In another scenario, if G2, G3 and G4  are  selected  then the total
> > count will be 58 which is  greater than 40. So G2 , G3 and G4  could
> > be selected.
> > G2 15
> > G3 12
> > G4 31
> >
> > So the restriction is to  find group IDs  that give a minim of  40.
> > Once, I reached a minim of 40 then stop selecting group  and output
> > the data..
> >
> > I am hope this helps
> >
> >
> >
> >
> > On Mon, Feb 11, 2019 at 5:09 PM Jeff Newmiller <[hidden email]> wrote:
> >>
> >> This constraint was not clear in your original sample data set. Can you expand the data set to clarify how this requirement REALLY works?
> >>
> >> On February 11, 2019 3:00:15 PM PST, Val <[hidden email]> wrote:
> >>> Thank you David.
> >>>
> >>> However, this will not work for me. If the group ID selected then all
> >>> of its observation should be included.
> >>>
> >>> On Mon, Feb 11, 2019 at 4:51 PM David L Carlson <[hidden email]>
> >>> wrote:
> >>>>
> >>>> First expand your data frame into a vector where G1 is repeated 25
> >>> times, G2 is repeated 15 times, etc. Then draw random samples of 40
> >>> from that vector:
> >>>>
> >>>>> grp <- rep(mydat$group, mydat$count)
> >>>>> grp.sam <- sample(grp, 40)
> >>>>> table(grp.sam)
> >>>> grp.sam
> >>>> G1 G2 G3 G4 G5
> >>>> 10  9  5 13  3
> >>>>
> >>>> ----------------------------------------
> >>>> David L Carlson
> >>>> Department of Anthropology
> >>>> Texas A&M University
> >>>> College Station, TX 77843-4352
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: R-help <[hidden email]> On Behalf Of Val
> >>>> Sent: Monday, February 11, 2019 4:36 PM
> >>>> To: [hidden email] ([hidden email])
> >>> <[hidden email]>
> >>>> Subject: [R] Select
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I have a data frame  with tow variables  group and its size.
> >>>> mydat<- read.table( text='group  count
> >>>> G1 25
> >>>> G2 15
> >>>> G3 12
> >>>> G4 31
> >>>> G5 10' , header = TRUE, as.is = TRUE )
> >>>>
> >>>> I want to select   group ID randomly (without replacement)  until
> >>> the
> >>>> sum of count reaches 40.
> >>>> So, in  the first case, the data frame could be
> >>>>    G4 31
> >>>>    65 10
> >>>>
> >>>> In other case, it could be
> >>>>   G5 10
> >>>>   G2 15
> >>>>   G3 12
> >>>>
> >>>> How do I put sum of count variable   is  a minimum of 40 restriction?
> >>>>
> >>>> Than k you in advance
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> I want to select group  ids randomly until I reach the
> >>>>
> >>>> ______________________________________________
> >>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                        Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.