hi all - i'm just wondering what sort of code people write to
essentially performa an aggregate call, but with different functions being applied to the various columns. for example, if i have a data frame x and would like to marginalize by a factor f for the rows, but apply mean() to col1 and median() to col2. if i wanted to apply mean() to both columns, i would call: aggregate(x, list(f), mean) but to get the mean of col1 and the median of col2, i have to write separate tapply calls, then wrap back into a data frame: data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean)) this is a somewhat inelegant solution for data frames with potentially many columns. what i would like is for aggregate to take a list of functions for columns, something like: aggregate(x, list(f), list(mean, median)) i'm just curious how others get around this limitation in aggregate(). do most simply make the individual tapply() calls separately, then possibly wrap them back up (as done in the example above), or is there a more elegant solution using some function of R that i might be unaware of? ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
On Thu, Jul 15, 2010 at 10:45 PM, Murat Tasan <[hidden email]> wrote:
> hi all - i'm just wondering what sort of code people write to > essentially performa an aggregate call, but with different functions > being applied to the various columns. > > for example, if i have a data frame x and would like to marginalize by > a factor f for the rows, but apply mean() to col1 and median() to > col2. > > if i wanted to apply mean() to both columns, i would call: > > aggregate(x, list(f), mean) > > but to get the mean of col1 and the median of col2, i have to write > separate tapply calls, then wrap back into a data frame: > > data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean)) > > this is a somewhat inelegant solution for data frames with potentially > many columns. > > what i would like is for aggregate to take a list of functions for > columns, something like: > > aggregate(x, list(f), list(mean, median)) > > > i'm just curious how others get around this limitation in aggregate(). > do most simply make the individual tapply() calls separately, then > possibly wrap them back up (as done in the example above), or is there > a more elegant solution using some function of R that i might be > unaware of? > Using sqldf we can write: > library(sqldf) > sqldf("select Treatment, avg(conc), median(uptake) from CO2 group by Treatment") Treatment avg(conc) median(uptake) 1 chilled 435 19.7 2 nonchilled 435 31.3 See http://sqldf.googlecode.com for more info. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Murat Tasan-3
Hi:
A nice package for doing this sort of thing is doBy. Let's manufacture an example since you didn't provide one: set.seed(126) d <- data.frame(g = rep(letters[1:3], each = 10), x1 = rnorm(30), x2 = rnorm(30, mean = 5), x3 = rnorm(30, mean = 10, s = 4)) # --Case 1: no grouping variables # If there are no grouping variables, you can define a function to apply # to each variable (column) with the apply() function. f <- function(x) c(mean(x), median(x)) # Apply to all numeric variables (not column 1): apply(d[, -1], 2, f) x1 x2 x3 [1,] -0.0647788 4.813318 10.21010 [2,] -0.0881492 4.916123 10.68559 # The mean of each variable is in the first row, the median in the second. # --Case 2: one or more grouping variables library(doBy) # If you have grouping variables, you can create a function with # names to apply to each variable groupwise. Notice that I named the # output variables mean and median, normally a no-no, and watch what # happens when it is used in summaryBy(). # Define the output function to apply to each variable f2 <- function(x) c(mean = mean(x), median = median(x)) # The leading dot on the left hand side of the formula in summaryBy() # indicates that the summary function is to be applied to all variables # not on the RHS of the formula: summaryBy(. ~ g, data = d, FUN = f2) g x1.mean x1.median x2.mean x2.median x3.mean x3.median 1 a 0.04571262 -0.06361278 4.253444 4.223015 11.259677 11.06834 2 b -0.15746011 -0.14223959 4.913657 5.116526 10.037674 11.32120 3 c -0.08258890 -0.06227865 5.272853 5.524493 9.332942 10.14600 You can use multiple grouping variables in the formula if desired. The function is meant to be applied to each LHS variable in each subgroup. It is required that the input object of summaryBy() be a data frame. The doBy package comes with a well-written vignette, wherein all of this is well described. HTH, Dennis On Thu, Jul 15, 2010 at 7:45 PM, Murat Tasan <[hidden email]> wrote: > hi all - i'm just wondering what sort of code people write to > essentially performa an aggregate call, but with different functions > being applied to the various columns. > > for example, if i have a data frame x and would like to marginalize by > a factor f for the rows, but apply mean() to col1 and median() to > col2. > > if i wanted to apply mean() to both columns, i would call: > > aggregate(x, list(f), mean) > > but to get the mean of col1 and the median of col2, i have to write > separate tapply calls, then wrap back into a data frame: > > data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean)) > > this is a somewhat inelegant solution for data frames with potentially > many columns. > > what i would like is for aggregate to take a list of functions for > columns, something like: > > aggregate(x, list(f), list(mean, median)) > > > i'm just curious how others get around this limitation in aggregate(). > do most simply make the individual tapply() calls separately, then > possibly wrap them back up (as done in the example above), or is there > a more elegant solution using some function of R that i might be > unaware of? > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Gabor Grothendieck
Hi
[hidden email] napsal dne 16.07.2010 05:02:52: > On Thu, Jul 15, 2010 at 10:45 PM, Murat Tasan <[hidden email]> wrote: > > hi all - i'm just wondering what sort of code people write to > > essentially performa an aggregate call, but with different functions > > being applied to the various columns. > > > > for example, if i have a data frame x and would like to marginalize by > > a factor f for the rows, but apply mean() to col1 and median() to > > col2. > > > > if i wanted to apply mean() to both columns, i would call: > > > > aggregate(x, list(f), mean) > > > > but to get the mean of col1 and the median of col2, i have to write > > separate tapply calls, then wrap back into a data frame: > > > > data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean)) > > > > this is a somewhat inelegant solution for data frames with potentially > > many columns. > > > > what i would like is for aggregate to take a list of functions for > > columns, something like: > > > > aggregate(x, list(f), list(mean, median)) > > > > > > i'm just curious how others get around this limitation in aggregate(). > > do most simply make the individual tapply() calls separately, then > > possibly wrap them back up (as done in the example above), or is there > > a more elegant solution using some function of R that i might be > > unaware of? If you want to use aggregate wrap data to cbind aggregate(data, list(y), function(x) cbind(mean(x), median(x))) Regards Petr > > > > Using sqldf we can write: > > > library(sqldf) > > sqldf("select Treatment, avg(conc), median(uptake) from CO2 group by Treatment") > Treatment avg(conc) median(uptake) > 1 chilled 435 19.7 > 2 nonchilled 435 31.3 > > See http://sqldf.googlecode.com for more info. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |