grouping

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

grouping

Val-17
Hi all,

Assume that I have the following 10 data points.
 x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

sort x  and get the following
  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

I want to  group the sorted  data point (y)  into  equal number of
observation per group. In this case there will be three groups.  The first
two groups  will have three observation  and the third will have four
observations

group 1  = 34, 45, 46
group 2  = 66, 78, 125
group 3  = 193, 209, 242,297

Finally I want to calculate the group mean

group 1  =  42
group 2  =  87
group 3  =  234

Can anyone help me out?

In SAS I used to do it using proc rank.

thanks in advance

Val

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

David Winsemius

On Apr 3, 2012, at 8:47 AM, Val wrote:

> Hi all,
>
> Assume that I have the following 10 data points.
> x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

The methods below do not require a sorting step.

>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The  
> first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234

I hope those weren't answers from SAS.

>
> Can anyone help me out?
>

I usually do this with Hmisc::cut2 since it has a `g = <n>` parameter  
that auto-magically calls the quantile splitting criterion but this is  
done in base R.

split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) )
$`[36,65.9]`
[1] 36 45 46

$`(65.9,189]`
[1]  66  78 125

$`(189,297]`
[1] 193 209 242 297


 > lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) ), mean)
$`[36,65.9]`
[1] 42.33333

$`(65.9,189]`
[1] 89.66667

$`(189,297]`
[1] 235.25

Or to get a table instead of a list:
 > tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) , mean)
  [36,65.9] (65.9,189]  (189,297]
   42.33333   89.66667  235.25000

> In SAS I used to do it using proc rank.

?quantile isn't equivalent to  Proc Rank but it will provide a useful  
basis for splitting or tabling functions.

>
> thanks in advance
>
> Val
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Michael Weylandt
In reply to this post by Val-17
Ignoring the fact your desired answers are wrong, I'd split the
separating part and the group means parts into three steps:

i) quantile() can help you get the split points,
ii)  findInterval() can assign each y to a group
iii) then ave() or tapply() will do group-wise means

Something like:

y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c" here.
ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

You could also use cut2 from the Hmisc package to combine findInterval
and quantile into a single step.

Depending on your desired output.

Hope that helps,
Michael

On Tue, Apr 3, 2012 at 8:47 AM, Val <[hidden email]> wrote:

> Hi all,
>
> Assume that I have the following 10 data points.
>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234
>
> Can anyone help me out?
>
> In SAS I used to do it using proc rank.
>
> thanks in advance
>
> Val
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Giovanni Petris
In reply to this post by Val-17
Probably something along the following lines:

> x <- c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
> sorted <- c(36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
> tapply(sorted, INDEX = (seq_along(sorted) - 1) %/% 3, FUN = mean)
        0         1         2         3
 42.33333  89.66667 214.66667 297.00000

Hope this helps,
Giovanni

On Tue, 2012-04-03 at 08:47 -0400, Val wrote:

> Hi all,
>
> Assume that I have the following 10 data points.
>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234
>
> Can anyone help me out?
>
> In SAS I used to do it using proc rank.
>
> thanks in advance
>
> Val
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--

Giovanni Petris  <[hidden email]>
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

K. Elo
In reply to this post by Val-17
Hi!

Maybe not the most elegant solution, but works:

for(i in seq(1,length(data)-(length(data) %% 3), 3)) {
ifelse((length(data)-i)>3, { print(sort(data)[ c(i:(i+2)) ]);
print(mean(sort(data)[ c(i:(i+2)) ])) }, { print(sort(data)[
c(i:length(data)) ]); print(mean(sort(data)[ c(i:length(data)) ])) } ) }

Produces:

[1] 36 45 46
[1] 42.33333
[1]  66  78 125
[1] 89.66667
[1] 193 209 242 297
[1] 235.25

HTH,
Kimmo

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Val-17
In reply to this post by Michael Weylandt
Thank you all (David, Michael, Giovanni)  for your prompt response.

First there was a typo error for the group mean it was 89.6 not 87.

For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
to group in to three groups in this case. However,  if I want to extend the
number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

Is there a short cut for that?


Thanks











On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt <
[hidden email]> wrote:

> Ignoring the fact your desired answers are wrong, I'd split the
> separating part and the group means parts into three steps:
>
> i) quantile() can help you get the split points,
> ii)  findInterval() can assign each y to a group
> iii) then ave() or tapply() will do group-wise means
>
> Something like:
>
> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c" here.
> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>
> You could also use cut2 from the Hmisc package to combine findInterval
> and quantile into a single step.
>
> Depending on your desired output.
>
> Hope that helps,
> Michael
>
> On Tue, Apr 3, 2012 at 8:47 AM, Val <[hidden email]> wrote:
> > Hi all,
> >
> > Assume that I have the following 10 data points.
> >  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
> >
> > sort x  and get the following
> >  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
> >
> > I want to  group the sorted  data point (y)  into  equal number of
> > observation per group. In this case there will be three groups.  The
> first
> > two groups  will have three observation  and the third will have four
> > observations
> >
> > group 1  = 34, 45, 46
> > group 2  = 66, 78, 125
> > group 3  = 193, 209, 242,297
> >
> > Finally I want to calculate the group mean
> >
> > group 1  =  42
> > group 2  =  87
> > group 3  =  234
> >
> > Can anyone help me out?
> >
> > In SAS I used to do it using proc rank.
> >
> > thanks in advance
> >
> > Val
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Michael Weylandt
Use cut2 as I suggested and David demonstrated.

Michael

On Tue, Apr 3, 2012 at 9:31 AM, Val <[hidden email]> wrote:

> Thank you all (David, Michael, Giovanni)  for your prompt response.
>
> First there was a typo error for the group mean it was 89.6 not 87.
>
> For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1) to
> group in to three groups in this case. However,  if I want to extend the
> number of groupings say 10 or 15 then do I have to figure it out the
>   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>
> Is there a short cut for that?
>
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
> <[hidden email]> wrote:
>>
>> Ignoring the fact your desired answers are wrong, I'd split the
>> separating part and the group means parts into three steps:
>>
>> i) quantile() can help you get the split points,
>> ii)  findInterval() can assign each y to a group
>> iii) then ave() or tapply() will do group-wise means
>>
>> Something like:
>>
>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c" here.
>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>
>> You could also use cut2 from the Hmisc package to combine findInterval
>> and quantile into a single step.
>>
>> Depending on your desired output.
>>
>> Hope that helps,
>> Michael
>>
>> On Tue, Apr 3, 2012 at 8:47 AM, Val <[hidden email]> wrote:
>> > Hi all,
>> >
>> > Assume that I have the following 10 data points.
>> >  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>> >
>> > sort x  and get the following
>> >  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>> >
>> > I want to  group the sorted  data point (y)  into  equal number of
>> > observation per group. In this case there will be three groups.  The
>> > first
>> > two groups  will have three observation  and the third will have four
>> > observations
>> >
>> > group 1  = 34, 45, 46
>> > group 2  = 66, 78, 125
>> > group 3  = 193, 209, 242,297
>> >
>> > Finally I want to calculate the group mean
>> >
>> > group 1  =  42
>> > group 2  =  87
>> > group 3  =  234
>> >
>> > Can anyone help me out?
>> >
>> > In SAS I used to do it using proc rank.
>> >
>> > thanks in advance
>> >
>> > Val
>> >
>> >        [[alternative HTML version deleted]]
>>
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Petr Savicky
In reply to this post by Val-17
On Tue, Apr 03, 2012 at 09:31:29AM -0400, Val wrote:

> Thank you all (David, Michael, Giovanni)  for your prompt response.
>
> First there was a typo error for the group mean it was 89.6 not 87.
>
> For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
> to group in to three groups in this case. However,  if I want to extend the
> number of groupings say 10 or 15 then do I have to figure it out the
>   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>
> Is there a short cut for that?

Hi.

There may be better ways for the whole task, but specifically
c(0, .333, .66 ,1) can be obtained as

  seq(0, 1, length=3+1)

Hope this helps.

Petr Savicky.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

David Winsemius
In reply to this post by Michael Weylandt

On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:

> Use cut2 as I suggested and David demonstrated.

Agree that Hmisc::cut2 is extremely handy and I also like that fact  
that the closed ends of intervals are on the left side (which is not  
the same behavior as cut()), which has the otehr effect of setting  
include.lowest = TRUE which is not the default for cut() either (to my  
continued amazement).

But let me add the method I use when doing it "by hand":

cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)

--
David.


>
> Michael
>
> On Tue, Apr 3, 2012 at 9:31 AM, Val <[hidden email]> wrote:
>> Thank you all (David, Michael, Giovanni)  for your prompt response.
>>
>> First there was a typo error for the group mean it was 89.6 not 87.
>>
>> For a small data set and few groupings I can use  prob=c(0, .333, .
>> 66 ,1) to
>> group in to three groups in this case. However,  if I want to  
>> extend the
>> number of groupings say 10 or 15 then do I have to figure it out the
>>   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>>
>> Is there a short cut for that?
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>> <[hidden email]> wrote:
>>>
>>> Ignoring the fact your desired answers are wrong, I'd split the
>>> separating part and the group means parts into three steps:
>>>
>>> i) quantile() can help you get the split points,
>>> ii)  findInterval() can assign each y to a group
>>> iii) then ave() or tapply() will do group-wise means
>>>
>>> Something like:
>>>
>>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a  
>>> "c" here.
>>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
>>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>>
>>> You could also use cut2 from the Hmisc package to combine  
>>> findInterval
>>> and quantile into a single step.
>>>
>>> Depending on your desired output.
>>>
>>> Hope that helps,
>>> Michael
>>>
>>> On Tue, Apr 3, 2012 at 8:47 AM, Val <[hidden email]> wrote:
>>>> Hi all,
>>>>
>>>> Assume that I have the following 10 data points.
>>>>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>>>>
>>>> sort x  and get the following
>>>>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>>>>
>>>> I want to  group the sorted  data point (y)  into  equal number of
>>>> observation per group. In this case there will be three groups.  
>>>> The
>>>> first
>>>> two groups  will have three observation  and the third will have  
>>>> four
>>>> observations
>>>>
>>>> group 1  = 34, 45, 46
>>>> group 2  = 66, 78, 125
>>>> group 3  = 193, 209, 242,297
>>>>
>>>> Finally I want to calculate the group mean
>>>>
>>>> group 1  =  42
>>>> group 2  =  87
>>>> group 3  =  234
>>>>
>>>> Can anyone help me out?
>>>>
>>>> In SAS I used to do it using proc rank.
>>>>
>>>> thanks in advance
>>>>
>>>> Val
>>>>
>>>>        [[alternative HTML version deleted]]
>>>
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

David Carlson
In reply to this post by Michael Weylandt
Or just replace c(0, .333, .667, 1) with

n <- 10
split(x, cut(x, quantile(x, prob= c(0, 1:(n-1)/n, 1)), include.lowest=TRUE))

where n is the number of groups you want.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of R. Michael Weylandt
Sent: Tuesday, April 03, 2012 8:32 AM
To: Val
Cc: [hidden email]
Subject: Re: [R] grouping

Use cut2 as I suggested and David demonstrated.

Michael

On Tue, Apr 3, 2012 at 9:31 AM, Val <[hidden email]> wrote:
> Thank you all (David, Michael, Giovanni)  for your prompt response.
>
> First there was a typo error for the group mean it was 89.6 not 87.
>
> For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
to

> group in to three groups in this case. However,  if I want to extend the
> number of groupings say 10 or 15 then do I have to figure it out the
>   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>
> Is there a short cut for that?
>
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
> <[hidden email]> wrote:
>>
>> Ignoring the fact your desired answers are wrong, I'd split the
>> separating part and the group means parts into three steps:
>>
>> i) quantile() can help you get the split points,
>> ii)  findInterval() can assign each y to a group
>> iii) then ave() or tapply() will do group-wise means
>>
>> Something like:
>>
>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"
here.

>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>
>> You could also use cut2 from the Hmisc package to combine findInterval
>> and quantile into a single step.
>>
>> Depending on your desired output.
>>
>> Hope that helps,
>> Michael
>>
>> On Tue, Apr 3, 2012 at 8:47 AM, Val <[hidden email]> wrote:
>> > Hi all,
>> >
>> > Assume that I have the following 10 data points.
>> >  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>> >
>> > sort x  and get the following
>> >  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>> >
>> > I want to  group the sorted  data point (y)  into  equal number of
>> > observation per group. In this case there will be three groups.  The
>> > first
>> > two groups  will have three observation  and the third will have four
>> > observations
>> >
>> > group 1  = 34, 45, 46
>> > group 2  = 66, 78, 125
>> > group 3  = 193, 209, 242,297
>> >
>> > Finally I want to calculate the group mean
>> >
>> > group 1  =  42
>> > group 2  =  87
>> > group 3  =  234
>> >
>> > Can anyone help me out?
>> >
>> > In SAS I used to do it using proc rank.
>> >
>> > thanks in advance
>> >
>> > Val
>> >
>> >        [[alternative HTML version deleted]]
>>
>> >
>> > ______________________________________________
>> > [hidden email] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Val-17
In reply to this post by David Winsemius
David W and all,

Thank you very much for your help.

Here is the final output that I want in the form of data frame. The data
frame should contain  x, group and group_ mean in the following way

x       group   group mean
46       1        42.3
125     2        89.6
36       1        42.3
193     3        235.25
209     3        235.25
78       2        89.6
66       2        89.6
242     3        235.25
297     3        235.25
45       1        42.3

Thanks a lot








On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius <[hidden email]>wrote:

>
> On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:
>
>  Use cut2 as I suggested and David demonstrated.
>>
>
> Agree that Hmisc::cut2 is extremely handy and I also like that fact that
> the closed ends of intervals are on the left side (which is not the same
> behavior as cut()), which has the otehr effect of setting include.lowest =
> TRUE which is not the default for cut() either (to my continued amazement).
>
> But let me add the method I use when doing it "by hand":
>
> cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)
>
> --
> David.
>
>
>
>
>> Michael
>>
>> On Tue, Apr 3, 2012 at 9:31 AM, Val <[hidden email]> wrote:
>>
>>> Thank you all (David, Michael, Giovanni)  for your prompt response.
>>>
>>> First there was a typo error for the group mean it was 89.6 not 87.
>>>
>>> For a small data set and few groupings I can use  prob=c(0, .333, .66
>>> ,1) to
>>> group in to three groups in this case. However,  if I want to extend the
>>> number of groupings say 10 or 15 then do I have to figure it out the
>>>  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>>>
>>> Is there a short cut for that?
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>>> <[hidden email]> wrote:
>>>
>>>>
>>>> Ignoring the fact your desired answers are wrong, I'd split the
>>>> separating part and the group means parts into three steps:
>>>>
>>>> i) quantile() can help you get the split points,
>>>> ii)  findInterval() can assign each y to a group
>>>> iii) then ave() or tapply() will do group-wise means
>>>>
>>>> Something like:
>>>>
>>>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"
>>>> here.
>>>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
>>>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>>>
>>>> You could also use cut2 from the Hmisc package to combine findInterval
>>>> and quantile into a single step.
>>>>
>>>> Depending on your desired output.
>>>>
>>>> Hope that helps,
>>>> Michael
>>>>
>>>> On Tue, Apr 3, 2012 at 8:47 AM, Val <[hidden email]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Assume that I have the following 10 data points.
>>>>>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>>>>>
>>>>> sort x  and get the following
>>>>>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>>>>>
>>>>> I want to  group the sorted  data point (y)  into  equal number of
>>>>> observation per group. In this case there will be three groups.  The
>>>>> first
>>>>> two groups  will have three observation  and the third will have four
>>>>> observations
>>>>>
>>>>> group 1  = 34, 45, 46
>>>>> group 2  = 66, 78, 125
>>>>> group 3  = 193, 209, 242,297
>>>>>
>>>>> Finally I want to calculate the group mean
>>>>>
>>>>> group 1  =  42
>>>>> group 2  =  87
>>>>> group 3  =  234
>>>>>
>>>>> Can anyone help me out?
>>>>>
>>>>> In SAS I used to do it using proc rank.
>>>>>
>>>>> thanks in advance
>>>>>
>>>>> Val
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>
>>>>
>>>>> ______________________________**________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>>
>> ______________________________**________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

David Winsemius

On Apr 3, 2012, at 10:11 AM, Val wrote:

> David W and all,
>
> Thank you very much for your help.
>
> Here is the final output that I want in the form of data frame. The  
> data frame should contain  x, group and group_ mean in the following  
> way
>
> x       group   group mean
> 46       1        42.3
> 125     2        89.6
> 36       1        42.3
> 193     3        235.25
> 209     3        235.25
> 78       2        89.6
> 66       2        89.6
> 242     3        235.25
> 297     3        235.25
> 45       1        42.3

I you want group means in a vector the same length as x then instead  
of using tapply as done in earlier solutions you should use `ave`.

--
DW


>
> Thanks a lot
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius <[hidden email]
> > wrote:
>
> On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:
>
> Use cut2 as I suggested and David demonstrated.
>
> Agree that Hmisc::cut2 is extremely handy and I also like that fact  
> that the closed ends of intervals are on the left side (which is not  
> the same behavior as cut()), which has the otehr effect of setting  
> include.lowest = TRUE which is not the default for cut() either (to  
> my continued amazement).
>
> But let me add the method I use when doing it "by hand":
>
> cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)),  
> include.lowest=TRUE)
>
> --
> David.
>
>
>
>
> Michael
>
> On Tue, Apr 3, 2012 at 9:31 AM, Val <[hidden email]> wrote:
> Thank you all (David, Michael, Giovanni)  for your prompt response.
>
> First there was a typo error for the group mean it was 89.6 not 87.
>
> For a small data set and few groupings I can use  prob=c(0, .333, .
> 66 ,1) to
> group in to three groups in this case. However,  if I want to extend  
> the
> number of groupings say 10 or 15 then do I have to figure it out the
>  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>
> Is there a short cut for that?
>
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
> <[hidden email]> wrote:
>
> Ignoring the fact your desired answers are wrong, I'd split the
> separating part and the group means parts into three steps:
>
> i) quantile() can help you get the split points,
> ii)  findInterval() can assign each y to a group
> iii) then ave() or tapply() will do group-wise means
>
> Something like:
>
> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"  
> here.
> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>
> You could also use cut2 from the Hmisc package to combine findInterval
> and quantile into a single step.
>
> Depending on your desired output.
>
> Hope that helps,
> Michael
>
> On Tue, Apr 3, 2012 at 8:47 AM, Val <[hidden email]> wrote:
> Hi all,
>
> Assume that I have the following 10 data points.
>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The
> first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234
>
> Can anyone help me out?
>
> In SAS I used to do it using proc rank.
>
> thanks in advance
>
> Val
>
>       [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Val-17
Hi All,

On the same data  points
x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )

I want to have have the following output  as data frame

x       group   group mean
46       1        42.3
125     2        89.6
36       1        42.3
193     3        235.25
209     3        235.25
78       2        89.6
66       2        89.6
242     3        235.25
297     3        235.25
45       1        42.3

I tried the following code


dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))))
gxc <- with(dat, tapply(xc, group, mean))
dat$gxc <- gxce[as.character(dat$group)]
txc=dat$gxc

it did not work for me.













On Tue, Apr 3, 2012 at 10:15 AM, David Winsemius <[hidden email]>wrote:

>
> On Apr 3, 2012, at 10:11 AM, Val wrote:
>
> David W and all,
>
> Thank you very much for your help.
>
> Here is the final output that I want in the form of data frame. The data
> frame should contain  x, group and group_ mean in the following way
>
> x       group   group mean
> 46       1        42.3
> 125     2        89.6
> 36       1        42.3
> 193     3        235.25
> 209     3        235.25
> 78       2        89.6
> 66       2        89.6
> 242     3        235.25
> 297     3        235.25
> 45       1        42.3
>
>
> I you want group means in a vector the same length as x then instead of
> using tapply as done in earlier solutions you should use `ave`.
>
> --
> DW
>
>
>
> Thanks a lot
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius <[hidden email]>wrote:
>
>>
>> On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:
>>
>>  Use cut2 as I suggested and David demonstrated.
>>>
>>
>> Agree that Hmisc::cut2 is extremely handy and I also like that fact that
>> the closed ends of intervals are on the left side (which is not the same
>> behavior as cut()), which has the otehr effect of setting include.lowest =
>> TRUE which is not the default for cut() either (to my continued amazement).
>>
>> But let me add the method I use when doing it "by hand":
>>
>> cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)
>>
>> --
>> David.
>>
>>
>>
>>
>>> Michael
>>>
>>> On Tue, Apr 3, 2012 at 9:31 AM, Val <[hidden email]> wrote:
>>>
>>>> Thank you all (David, Michael, Giovanni)  for your prompt response.
>>>>
>>>> First there was a typo error for the group mean it was 89.6 not 87.
>>>>
>>>> For a small data set and few groupings I can use  prob=c(0, .333, .66
>>>> ,1) to
>>>> group in to three groups in this case. However,  if I want to extend the
>>>> number of groupings say 10 or 15 then do I have to figure it out the
>>>>  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>>>>
>>>> Is there a short cut for that?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>>>> <[hidden email]> wrote:
>>>>
>>>>>
>>>>> Ignoring the fact your desired answers are wrong, I'd split the
>>>>> separating part and the group means parts into three steps:
>>>>>
>>>>> i) quantile() can help you get the split points,
>>>>> ii)  findInterval() can assign each y to a group
>>>>> iii) then ave() or tapply() will do group-wise means
>>>>>
>>>>> Something like:
>>>>>
>>>>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"
>>>>> here.
>>>>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
>>>>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>>>>
>>>>> You could also use cut2 from the Hmisc package to combine findInterval
>>>>> and quantile into a single step.
>>>>>
>>>>> Depending on your desired output.
>>>>>
>>>>> Hope that helps,
>>>>> Michael
>>>>>
>>>>> On Tue, Apr 3, 2012 at 8:47 AM, Val <[hidden email]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Assume that I have the following 10 data points.
>>>>>>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>>>>>>
>>>>>> sort x  and get the following
>>>>>>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>>>>>>
>>>>>> I want to  group the sorted  data point (y)  into  equal number of
>>>>>> observation per group. In this case there will be three groups.  The
>>>>>> first
>>>>>> two groups  will have three observation  and the third will have four
>>>>>> observations
>>>>>>
>>>>>> group 1  = 34, 45, 46
>>>>>> group 2  = 66, 78, 125
>>>>>> group 3  = 193, 209, 242,297
>>>>>>
>>>>>> Finally I want to calculate the group mean
>>>>>>
>>>>>> group 1  =  42
>>>>>> group 2  =  87
>>>>>> group 3  =  234
>>>>>>
>>>>>> Can anyone help me out?
>>>>>>
>>>>>> In SAS I used to do it using proc rank.
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Val
>>>>>>
>>>>>>       [[alternative HTML version deleted]]
>>>>>>
>>>>>
>>>>>
>>>>>> ______________________________**________________
>>>>>> [hidden email] mailing list
>>>>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>>
>>> ______________________________**________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Petr Savicky
On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:

> Hi All,
>
> On the same data  points
> x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
>
> I want to have have the following output  as data frame
>
> x       group   group mean
> 46       1        42.3
> 125     2        89.6
> 36       1        42.3
> 193     3        235.25
> 209     3        235.25
> 78       2        89.6
> 66       2        89.6
> 242     3        235.25
> 297     3        235.25
> 45       1        42.3
>
> I tried the following code
>
>
> dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))))
> gxc <- with(dat, tapply(xc, group, mean))
> dat$gxc <- gxce[as.character(dat$group)]
> txc=dat$gxc
>
> it did not work for me.

David Winsemius suggested to use ave(), when you asked this
question for the first time. Can you have look at it?

Petr Savicky.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Val-17
I did look at it the result  is below,

x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )

#lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)
  ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)

> ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)
$`[36,74]`
[1] NA

$`(74,197]`
[1] NA

$`(197,297]`
[1] NA

There were 11 warnings (use warnings() to see them)





On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky <[hidden email]> wrote:

> On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:
> > Hi All,
> >
> > On the same data  points
> > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
> >
> > I want to have have the following output  as data frame
> >
> > x       group   group mean
> > 46       1        42.3
> > 125     2        89.6
> > 36       1        42.3
> > 193     3        235.25
> > 209     3        235.25
> > 78       2        89.6
> > 66       2        89.6
> > 242     3        235.25
> > 297     3        235.25
> > 45       1        42.3
> >
> > I tried the following code
> >
> >
> > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
> ,1))))
> > gxc <- with(dat, tapply(xc, group, mean))
> > dat$gxc <- gxce[as.character(dat$group)]
> > txc=dat$gxc
> >
> > it did not work for me.
>
> David Winsemius suggested to use ave(), when you asked this
> question for the first time. Can you have look at it?
>
> Petr Savicky.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Berend Hasselman
In reply to this post by Val-17

On 03-04-2012, at 20:21, Val wrote:

> Hi All,
>
> On the same data  points
> x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
>
> I want to have have the following output  as data frame
>
> x       group   group mean
> 46       1        42.3
> 125     2        89.6
> 36       1        42.3
> 193     3        235.25
> 209     3        235.25
> 78       2        89.6
> 66       2        89.6
> 242     3        235.25
> 297     3        235.25
> 45       1        42.3
>
> I tried the following code
>
>
> dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))))
> gxc <- with(dat, tapply(xc, group, mean))
> dat$gxc <- gxce[as.character(dat$group)]
> txc=dat$gxc
>
> it did not work for me.
>

I'm not surprised.

In the line dat <- there are 5 opening parentheses and 4 closing )'s.
In the line dat$gxc <- you reference an object gxce. Where was it created?

So I tried this

> dat <- data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 ,1)), all.inside=TRUE))
> dat$gmean <- ave(dat$x, as.factor(dat$group))
> dat
     x group     gmean
1   46     1  42.33333
2  125     2  89.66667
3   36     1  42.33333
4  193     3 235.25000
5  209     3 235.25000
6   78     2  89.66667
7   66     2  89.66667
8  242     3 235.25000
9  297     3 235.25000
10  45     1  42.33333

Berend

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Val-17
On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman <[hidden email]> wrote:

>
> On 03-04-2012, at 20:21, Val wrote:
>
> > Hi All,
> >
> > On the same data  points
> > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
> >
> > I want to have have the following output  as data frame
> >
> > x       group   group mean
> > 46       1        42.3
> > 125     2        89.6
> > 36       1        42.3
> > 193     3        235.25
> > 209     3        235.25
> > 78       2        89.6
> > 66       2        89.6
> > 242     3        235.25
> > 297     3        235.25
> > 45       1        42.3
> >
> > I tried the following code
> >
> >
> > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
> ,1))))
> > gxc <- with(dat, tapply(xc, group, mean))
> > dat$gxc <- gxce[as.character(dat$group)]
> > txc=dat$gxc
> >
> > it did not work for me.
> >
>
> I'm not surprised.
>
> In the line dat <- there are 5 opening parentheses and 4 closing )'s.
> In the line dat$gxc <- you reference an object gxce. Where was it created?
>
> So I tried this
>
> > dat <- data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333,
> .66 ,1)), all.inside=TRUE))
> > dat$gmean <- ave(dat$x, as.factor(dat$group))
> > dat
>     x group     gmean
> 1   46     1  42.33333
> 2  125     2  89.66667
> 3   36     1  42.33333
> 4  193     3 235.25000
> 5  209     3 235.25000
> 6   78     2  89.66667
> 7   66     2  89.66667
> 8  242     3 235.25000
> 9  297     3 235.25000
> 10  45     1  42.33333
>
>
Thank you very much. It is working now.  there  was a type error on
"gxce". But in the  r-code it was correct,  gxc..




> Berend
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Berend Hasselman

On 03-04-2012, at 21:02, Val wrote:

>
>
> On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman <[hidden email]> wrote:
>
> On 03-04-2012, at 20:21, Val wrote:
>
> > Hi All,
> >
> > On the same data  points
> > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
> >
> > I want to have have the following output  as data frame
> >
> > x       group   group mean
> > 46       1        42.3
> > 125     2        89.6
> > 36       1        42.3
> > 193     3        235.25
> > 209     3        235.25
> > 78       2        89.6
> > 66       2        89.6
> > 242     3        235.25
> > 297     3        235.25
> > 45       1        42.3
> >
> > I tried the following code
> >
> >
> > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))))
> > gxc <- with(dat, tapply(xc, group, mean))
> > dat$gxc <- gxce[as.character(dat$group)]
> > txc=dat$gxc
> >
> > it did not work for me.
> >
>
> I'm not surprised.
>
> In the line dat <- there are 5 opening parentheses and 4 closing )'s.
> In the line dat$gxc <- you reference an object gxce. Where was it created?
>
> So I tried this
>
> > dat <- data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 ,1)), all.inside=TRUE))
> > dat$gmean <- ave(dat$x, as.factor(dat$group))

And the as.factor is not necessary. This will do

dat$gmean <- ave(dat$x, dat$group)

Berend

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: grouping

Michael Weylandt
In reply to this post by Val-17
Please take a look at my first reply to you:

ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))

Then read ?ave for an explanation of the syntax. ave takes two
vectors, the first being the data to be averaged, the second being an
index to split by. You don't want to use split() here.

Michael

On Tue, Apr 3, 2012 at 2:50 PM, Val <[hidden email]> wrote:

> I did look at it the result  is below,
>
> x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
>
> #lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
> include.lowest=TRUE) ), mean)
>  ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
> include.lowest=TRUE) ), mean)
>
>> ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
> include.lowest=TRUE) ), mean)
> $`[36,74]`
> [1] NA
>
> $`(74,197]`
> [1] NA
>
> $`(197,297]`
> [1] NA
>
> There were 11 warnings (use warnings() to see them)
>
>
>
>
>
> On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky <[hidden email]> wrote:
>
>> On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:
>> > Hi All,
>> >
>> > On the same data  points
>> > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
>> >
>> > I want to have have the following output  as data frame
>> >
>> > x       group   group mean
>> > 46       1        42.3
>> > 125     2        89.6
>> > 36       1        42.3
>> > 193     3        235.25
>> > 209     3        235.25
>> > 78       2        89.6
>> > 66       2        89.6
>> > 242     3        235.25
>> > 297     3        235.25
>> > 45       1        42.3
>> >
>> > I tried the following code
>> >
>> >
>> > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
>> ,1))))
>> > gxc <- with(dat, tapply(xc, group, mean))
>> > dat$gxc <- gxce[as.character(dat$group)]
>> > txc=dat$gxc
>> >
>> > it did not work for me.
>>
>> David Winsemius suggested to use ave(), when you asked this
>> question for the first time. Can you have look at it?
>>
>> Petr Savicky.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.