aggregate vs tapply; is there a middle ground?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

aggregate vs tapply; is there a middle ground?

Joseph LeBouton
Dear all,

I'm wanting to do a series of comparisons among 4 categorical variables:

a <- aggregate(y, list(var1, var2, var3, var4), sum)

This gets me a very nice 2-dimensional data frame with one column per
variable, BUT, as help for aggregate says, <<empty subsets are
removed>>.  I don't see in help(aggregate) how I can change this.

In contrast,
a <- tapply(y, list(var1, var2, var3, var4), sum)

gives me results for everything including empty subsets, but in an
awkward 4-dimensional array that takes me another 10 lines of
inefficient code to turn into a 2D data.frame.

Is there a way to directly do this calculation INCLUDING results for
empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR
alternatively is there a simple way to mush the 4D result from the
tapply into a 2D matrix/data.frame?

thanks very much in advance for any help!

-jlb

--
************************************
Joseph P. LeBouton
Forest Ecology PhD Candidate
Department of Forestry
Michigan State University
East Lansing, Michigan 48824

Office phone: 517-355-7744
email: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: aggregate vs tapply; is there a middle ground?

Joseph LeBouton
Thanks, Phil!  I've literally spent two hours on my own trying to find
something that does exactly that.  Thanks for another pair of functions
added to my (slowly!) growing R vocabulary.

-jlb

Phil Spector wrote:

> Joseph -
>    I'm sure there are clearer and more efficient ways to do it, but
> here's something
> that seems to do what you want:
>
> z = tapply(y,list(var1,var2,var3,var4),sum)
> data.frame(do.call('expand.grid',dimnames(z)),y=do.call('rbind',as.list(z)))
>
>
>                                        - Phil Spector
>                      Statistical Computing Facility
>                      Department of Statistics
>                      UC Berkeley
>                      [hidden email]
>
>
> On Sat, 11 Feb 2006, Joseph LeBouton wrote:
>
>> Dear all,
>>
>> I'm wanting to do a series of comparisons among 4 categorical variables:
>>
>> a <- aggregate(y, list(var1, var2, var3, var4), sum)
>>
>> This gets me a very nice 2-dimensional data frame with one column per
>> variable, BUT, as help for aggregate says, <<empty subsets are
>> removed>>.  I don't see in help(aggregate) how I can change this.
>>
>> In contrast,
>> a <- tapply(y, list(var1, var2, var3, var4), sum)
>>
>> gives me results for everything including empty subsets, but in an
>> awkward 4-dimensional array that takes me another 10 lines of
>> inefficient code to turn into a 2D data.frame.
>>
>> Is there a way to directly do this calculation INCLUDING results for
>> empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR
>> alternatively is there a simple way to mush the 4D result from the
>> tapply into a 2D matrix/data.frame?
>>
>> thanks very much in advance for any help!
>>
>> -jlb
>>
>> --
>> ************************************
>> Joseph P. LeBouton
>> Forest Ecology PhD Candidate
>> Department of Forestry
>> Michigan State University
>> East Lansing, Michigan 48824
>>
>> Office phone: 517-355-7744
>> email: [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
>

--
************************************
Joseph P. LeBouton
Forest Ecology PhD Candidate
Department of Forestry
Michigan State University
East Lansing, Michigan 48824

Office phone: 517-355-7744
email: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html