grouping Classic List Threaded 19 messages Open this post in threaded view
|

grouping

 Hi all, Assume that I have the following 10 data points.  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x  and get the following   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297) I want to  group the sorted  data point (y)  into  equal number of observation per group. In this case there will be three groups.  The first two groups  will have three observation  and the third will have four observations group 1  = 34, 45, 46 group 2  = 66, 78, 125 group 3  = 193, 209, 242,297 Finally I want to calculate the group mean group 1  =  42 group 2  =  87 group 3  =  234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: grouping

 On Apr 3, 2012, at 8:47 AM, Val wrote: > Hi all, > > Assume that I have the following 10 data points. > x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) > > sort x  and get the following >  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297) The methods below do not require a sorting step. > > I want to  group the sorted  data point (y)  into  equal number of > observation per group. In this case there will be three groups.  The   > first > two groups  will have three observation  and the third will have four > observations > > group 1  = 34, 45, 46 > group 2  = 66, 78, 125 > group 3  = 193, 209, 242,297 > > Finally I want to calculate the group mean > > group 1  =  42 > group 2  =  87 > group 3  =  234 I hope those weren't answers from SAS. > > Can anyone help me out? > I usually do this with Hmisc::cut2 since it has a `g = ` parameter   that auto-magically calls the quantile splitting criterion but this is   done in base R. split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,   include.lowest=TRUE) ) \$`[36,65.9]`  36 45 46 \$`(65.9,189]`   66  78 125 \$`(189,297]`  193 209 242 297  > lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,   include.lowest=TRUE) ), mean) \$`[36,65.9]`  42.33333 \$`(65.9,189]`  89.66667 \$`(189,297]`  235.25 Or to get a table instead of a list:  > tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,   include.lowest=TRUE) , mean)   [36,65.9] (65.9,189]  (189,297]    42.33333   89.66667  235.25000 > In SAS I used to do it using proc rank. ?quantile isn't equivalent to  Proc Rank but it will provide a useful   basis for splitting or tabling functions. > > thanks in advance > > Val > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: grouping

Open this post in threaded view
|

Re: grouping

 In reply to this post by Val-17 Probably something along the following lines: > x <- c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) > sorted <- c(36 , 45 , 46,  66, 78,  125,193, 209, 242, 297) > tapply(sorted, INDEX = (seq_along(sorted) - 1) %/% 3, FUN = mean)         0         1         2         3  42.33333  89.66667 214.66667 297.00000 Hope this helps, Giovanni On Tue, 2012-04-03 at 08:47 -0400, Val wrote: > Hi all, > > Assume that I have the following 10 data points. >  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) > > sort x  and get the following >   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297) > > I want to  group the sorted  data point (y)  into  equal number of > observation per group. In this case there will be three groups.  The first > two groups  will have three observation  and the third will have four > observations > > group 1  = 34, 45, 46 > group 2  = 66, 78, 125 > group 3  = 193, 209, 242,297 > > Finally I want to calculate the group mean > > group 1  =  42 > group 2  =  87 > group 3  =  234 > > Can anyone help me out? > > In SAS I used to do it using proc rank. > > thanks in advance > > Val > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Giovanni Petris  <[hidden email]> Associate Professor Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: grouping

 In reply to this post by Val-17 Hi! Maybe not the most elegant solution, but works: for(i in seq(1,length(data)-(length(data) %% 3), 3)) { ifelse((length(data)-i)>3, { print(sort(data)[ c(i:(i+2)) ]); print(mean(sort(data)[ c(i:(i+2)) ])) }, { print(sort(data)[ c(i:length(data)) ]); print(mean(sort(data)[ c(i:length(data)) ])) } ) } Produces:  36 45 46  42.33333   66  78 125  89.66667  193 209 242 297  235.25 HTH, Kimmo ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: grouping

Open this post in threaded view
|

Re: grouping

Open this post in threaded view
|

Re: grouping

 In reply to this post by Val-17 On Tue, Apr 03, 2012 at 09:31:29AM -0400, Val wrote: > Thank you all (David, Michael, Giovanni)  for your prompt response. > > First there was a typo error for the group mean it was 89.6 not 87. > > For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1) > to group in to three groups in this case. However,  if I want to extend the > number of groupings say 10 or 15 then do I have to figure it out the >   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) > > Is there a short cut for that? Hi. There may be better ways for the whole task, but specifically c(0, .333, .66 ,1) can be obtained as   seq(0, 1, length=3+1) Hope this helps. Petr Savicky. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: grouping

Open this post in threaded view
|

Re: grouping

Open this post in threaded view
|

Re: grouping

Open this post in threaded view
|

Re: grouping

Open this post in threaded view
|

Re: grouping

Open this post in threaded view
|

Re: grouping

 On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote: > Hi All, > > On the same data  points > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) > > I want to have have the following output  as data frame > > x       group   group mean > 46       1        42.3 > 125     2        89.6 > 36       1        42.3 > 193     3        235.25 > 209     3        235.25 > 78       2        89.6 > 66       2        89.6 > 242     3        235.25 > 297     3        235.25 > 45       1        42.3 > > I tried the following code > > > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)))) > gxc <- with(dat, tapply(xc, group, mean)) > dat\$gxc <- gxce[as.character(dat\$group)] > txc=dat\$gxc > > it did not work for me. David Winsemius suggested to use ave(), when you asked this question for the first time. Can you have look at it? Petr Savicky. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: grouping

 I did look at it the result  is below, x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) #lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean)   ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) > ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) \$`[36,74]`  NA \$`(74,197]`  NA \$`(197,297]`  NA There were 11 warnings (use warnings() to see them) On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky <[hidden email]> wrote: > On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote: > > Hi All, > > > > On the same data  points > > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) > > > > I want to have have the following output  as data frame > > > > x       group   group mean > > 46       1        42.3 > > 125     2        89.6 > > 36       1        42.3 > > 193     3        235.25 > > 209     3        235.25 > > 78       2        89.6 > > 66       2        89.6 > > 242     3        235.25 > > 297     3        235.25 > > 45       1        42.3 > > > > I tried the following code > > > > > > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 > ,1)))) > > gxc <- with(dat, tapply(xc, group, mean)) > > dat\$gxc <- gxce[as.character(dat\$group)] > > txc=dat\$gxc > > > > it did not work for me. > > David Winsemius suggested to use ave(), when you asked this > question for the first time. Can you have look at it? > > Petr Savicky. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: grouping

 In reply to this post by Val-17 On 03-04-2012, at 20:21, Val wrote: > Hi All, > > On the same data  points > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) > > I want to have have the following output  as data frame > > x       group   group mean > 46       1        42.3 > 125     2        89.6 > 36       1        42.3 > 193     3        235.25 > 209     3        235.25 > 78       2        89.6 > 66       2        89.6 > 242     3        235.25 > 297     3        235.25 > 45       1        42.3 > > I tried the following code > > > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)))) > gxc <- with(dat, tapply(xc, group, mean)) > dat\$gxc <- gxce[as.character(dat\$group)] > txc=dat\$gxc > > it did not work for me. > I'm not surprised. In the line dat <- there are 5 opening parentheses and 4 closing )'s. In the line dat\$gxc <- you reference an object gxce. Where was it created? So I tried this > dat <- data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 ,1)), all.inside=TRUE)) > dat\$gmean <- ave(dat\$x, as.factor(dat\$group)) > dat      x group     gmean 1   46     1  42.33333 2  125     2  89.66667 3   36     1  42.33333 4  193     3 235.25000 5  209     3 235.25000 6   78     2  89.66667 7   66     2  89.66667 8  242     3 235.25000 9  297     3 235.25000 10  45     1  42.33333 Berend ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

Re: grouping

 On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman <[hidden email]> wrote: > > On 03-04-2012, at 20:21, Val wrote: > > > Hi All, > > > > On the same data  points > > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 ) > > > > I want to have have the following output  as data frame > > > > x       group   group mean > > 46       1        42.3 > > 125     2        89.6 > > 36       1        42.3 > > 193     3        235.25 > > 209     3        235.25 > > 78       2        89.6 > > 66       2        89.6 > > 242     3        235.25 > > 297     3        235.25 > > 45       1        42.3 > > > > I tried the following code > > > > > > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 > ,1)))) > > gxc <- with(dat, tapply(xc, group, mean)) > > dat\$gxc <- gxce[as.character(dat\$group)] > > txc=dat\$gxc > > > > it did not work for me. > > > > I'm not surprised. > > In the line dat <- there are 5 opening parentheses and 4 closing )'s. > In the line dat\$gxc <- you reference an object gxce. Where was it created? > > So I tried this > > > dat <- data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, > .66 ,1)), all.inside=TRUE)) > > dat\$gmean <- ave(dat\$x, as.factor(dat\$group)) > > dat >     x group     gmean > 1   46     1  42.33333 > 2  125     2  89.66667 > 3   36     1  42.33333 > 4  193     3 235.25000 > 5  209     3 235.25000 > 6   78     2  89.66667 > 7   66     2  89.66667 > 8  242     3 235.25000 > 9  297     3 235.25000 > 10  45     1  42.33333 > > Thank you very much. It is working now.  there  was a type error on "gxce". But in the  r-code it was correct,  gxc.. > Berend > >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.