Quantcast

aggregate(...) with multiple functions

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

aggregate(...) with multiple functions

Murat Tasan-3
hi all - i'm just wondering what sort of code people write to
essentially performa an aggregate call, but with different functions
being applied to the various columns.

for example, if i have a data frame x and would like to marginalize by
a factor f for the rows, but apply mean() to col1 and median() to
col2.

if i wanted to apply mean() to both columns, i would call:

aggregate(x, list(f), mean)

but to get the mean of col1 and the median of col2, i have to write
separate tapply calls, then wrap back into a data frame:

data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean))

this is a somewhat inelegant solution for data frames with potentially
many columns.

what i would like is for aggregate to take a list of functions for
columns, something like:

aggregate(x, list(f), list(mean, median))


i'm just curious how others get around this limitation in aggregate().
do most simply make the individual tapply() calls separately, then
possibly wrap them back up (as done in the example above), or is there
a more elegant solution using some function of R that i might be
unaware of?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: aggregate(...) with multiple functions

Gabor Grothendieck
On Thu, Jul 15, 2010 at 10:45 PM, Murat Tasan <[hidden email]> wrote:

> hi all - i'm just wondering what sort of code people write to
> essentially performa an aggregate call, but with different functions
> being applied to the various columns.
>
> for example, if i have a data frame x and would like to marginalize by
> a factor f for the rows, but apply mean() to col1 and median() to
> col2.
>
> if i wanted to apply mean() to both columns, i would call:
>
> aggregate(x, list(f), mean)
>
> but to get the mean of col1 and the median of col2, i have to write
> separate tapply calls, then wrap back into a data frame:
>
> data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean))
>
> this is a somewhat inelegant solution for data frames with potentially
> many columns.
>
> what i would like is for aggregate to take a list of functions for
> columns, something like:
>
> aggregate(x, list(f), list(mean, median))
>
>
> i'm just curious how others get around this limitation in aggregate().
> do most simply make the individual tapply() calls separately, then
> possibly wrap them back up (as done in the example above), or is there
> a more elegant solution using some function of R that i might be
> unaware of?
>

Using sqldf we can write:

> library(sqldf)
> sqldf("select Treatment, avg(conc), median(uptake) from CO2 group by Treatment")
   Treatment avg(conc) median(uptake)
1    chilled       435           19.7
2 nonchilled       435           31.3

See http://sqldf.googlecode.com for more info.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: aggregate(...) with multiple functions

djmuseR
In reply to this post by Murat Tasan-3
Hi:

A nice package for doing this sort of thing is doBy. Let's manufacture an
example
since you didn't provide one:

set.seed(126)
d <- data.frame(g = rep(letters[1:3], each = 10),
                 x1 = rnorm(30),
                 x2 = rnorm(30, mean = 5),
                 x3 = rnorm(30, mean = 10, s = 4))

# --Case 1: no grouping variables

# If there are no grouping variables, you can define a function to apply
# to each variable (column) with the apply() function.

f  <- function(x) c(mean(x), median(x))

# Apply to all numeric variables (not column 1):
apply(d[, -1], 2, f)
             x1       x2       x3
[1,] -0.0647788 4.813318 10.21010
[2,] -0.0881492 4.916123 10.68559


# The mean of each variable is in the first row, the median in the second.


# --Case 2: one or more grouping variables

library(doBy)

# If you have grouping variables, you can create a function with
# names to apply to each variable groupwise. Notice that I named the
# output variables mean and median, normally a no-no, and watch what
# happens when it is used in summaryBy().

# Define the output function to apply to each variable
f2 <- function(x) c(mean = mean(x), median = median(x))

# The leading dot on the left hand side of the formula  in summaryBy()
# indicates that the summary function is to be applied to all variables
# not on the RHS of  the formula:

summaryBy(. ~ g, data = d, FUN = f2)
  g     x1.mean   x1.median  x2.mean x2.median   x3.mean x3.median
1 a  0.04571262 -0.06361278 4.253444  4.223015 11.259677  11.06834
2 b -0.15746011 -0.14223959 4.913657  5.116526 10.037674  11.32120
3 c -0.08258890 -0.06227865 5.272853  5.524493  9.332942  10.14600

You can use multiple grouping variables in the formula if desired. The
function is meant to be applied to each LHS variable in each subgroup.
It is required that the input object of summaryBy() be a data frame.

The doBy package comes with a well-written vignette, wherein all of this
is well described.

HTH,
Dennis


On Thu, Jul 15, 2010 at 7:45 PM, Murat Tasan <[hidden email]> wrote:

> hi all - i'm just wondering what sort of code people write to
> essentially performa an aggregate call, but with different functions
> being applied to the various columns.
>
> for example, if i have a data frame x and would like to marginalize by
> a factor f for the rows, but apply mean() to col1 and median() to
> col2.
>
> if i wanted to apply mean() to both columns, i would call:
>
> aggregate(x, list(f), mean)
>
> but to get the mean of col1 and the median of col2, i have to write
> separate tapply calls, then wrap back into a data frame:
>
> data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean))
>
> this is a somewhat inelegant solution for data frames with potentially
> many columns.
>
> what i would like is for aggregate to take a list of functions for
> columns, something like:
>
> aggregate(x, list(f), list(mean, median))
>
>
> i'm just curious how others get around this limitation in aggregate().
> do most simply make the individual tapply() calls separately, then
> possibly wrap them back up (as done in the example above), or is there
> a more elegant solution using some function of R that i might be
> unaware of?
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: aggregate(...) with multiple functions

PIKAL Petr
In reply to this post by Gabor Grothendieck
Hi
[hidden email] napsal dne 16.07.2010 05:02:52:

> On Thu, Jul 15, 2010 at 10:45 PM, Murat Tasan <[hidden email]> wrote:
> > hi all - i'm just wondering what sort of code people write to
> > essentially performa an aggregate call, but with different functions
> > being applied to the various columns.
> >
> > for example, if i have a data frame x and would like to marginalize by
> > a factor f for the rows, but apply mean() to col1 and median() to
> > col2.
> >
> > if i wanted to apply mean() to both columns, i would call:
> >
> > aggregate(x, list(f), mean)
> >
> > but to get the mean of col1 and the median of col2, i have to write
> > separate tapply calls, then wrap back into a data frame:
> >
> > data.frame(tapply(x$col1, f, mean), tapply(x$col2, f, mean))
> >
> > this is a somewhat inelegant solution for data frames with potentially
> > many columns.
> >
> > what i would like is for aggregate to take a list of functions for
> > columns, something like:
> >
> > aggregate(x, list(f), list(mean, median))
> >
> >
> > i'm just curious how others get around this limitation in aggregate().
> > do most simply make the individual tapply() calls separately, then
> > possibly wrap them back up (as done in the example above), or is there
> > a more elegant solution using some function of R that i might be
> > unaware of?

If you want to use aggregate wrap data to cbind

aggregate(data, list(y), function(x) cbind(mean(x), median(x)))

Regards
Petr



> >
>
> Using sqldf we can write:
>
> > library(sqldf)
> > sqldf("select Treatment, avg(conc), median(uptake) from CO2 group by
Treatment")

>    Treatment avg(conc) median(uptake)
> 1    chilled       435           19.7
> 2 nonchilled       435           31.3
>
> See http://sqldf.googlecode.com for more info.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...