

Okay everyone heres a likely softball for someone.
Consider the following data frame:
#Create data
x<rep(c(1,15),10)
y<rnorm(20)
z<c(rep("auto",10),rep("bus",10))
a<rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<data.frame(Source=x,Rate=y,Bin=a,Type=z)
I want to create a new column the equals the sum of the Rates for each type (1,15) by Bin.
A related question: I have been using R for a while now and usually manipulate my data in data frames but i know lists are better for R so perhaps the above should be done using lists. Feel free to offer suggestions coming from that angle.
Thanks guys
JR


I will hit my own ball on this one
tapply(Df$Rate,list(Df$Bin,Df$Type),sum)


One possibility is:
library(doBy)
summaryBy(Rate~Source+Bin, data=Df, FUN=sum)
On 5/17/2011 12:48 PM, LCOG1 wrote:
> Okay everyone heres a likely softball for someone.
>
> Consider the following data frame:
>
> #Create data
> x<rep(c(1,15),10)
> y<rnorm(20)
> z<c(rep("auto",10),rep("bus",10))
> a<rep(c(1,1,2,2,3,3,4,4,5,5),2)
> #Create Data frame
> Df<data.frame(Source=x,Rate=y,Bin=a,Type=z)
>
>
> I want to create a new column the equals the sum of the Rates for each type
> (1,15) by Bin.
>
> A related question: I have been using R for a while now and usually
> manipulate my data in data frames but i know lists are better for R so
> perhaps the above should be done using lists. Feel free to offer
> suggestions coming from that angle.
>
> Thanks guys
>
> JR
>
>
>
> 
> View this message in context: http://r.789695.n4.nabble.com/Summarizebytwoormoreattributestp3529825p3529825.html> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Like This?
x<rep(c(1,15),10)
y<rnorm(20)
z<c(rep("auto",10),rep("bus",10))
a<rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<data.frame(Source=x,Rate=y,Bin=a,Type=z)
Df
ddply(Df,c('Type','Bin'),summarise,Summed=sum(Rate))
# Adding a column to Df
ddply(Df,c('Type','Bin'),mutate,Summed=sum(Rate))
# Convert the result to a list
dlply(Df,c('Type','Bin'),summarise,Summed=sum(Rate))
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx Original Message 
> From: LCOG1 < [hidden email]>
> To: [hidden email]
> Sent: Tue, May 17, 2011 9:48:36 AM
> Subject: [R] Summarize by two or more attributes
>
> Okay everyone heres a likely softball for someone.
>
> Consider the following data frame:
>
> #Create data
> x<rep(c(1,15),10)
> y<rnorm(20)
> z<c(rep("auto",10),rep("bus",10))
> a<rep(c(1,1,2,2,3,3,4,4,5,5),2)
> #Create Data frame
> Df<data.frame(Source=x,Rate=y,Bin=a,Type=z)
>
>
> I want to create a new column the equals the sum of the Rates for each type
> (1,15) by Bin.
>
> A related question: I have been using R for a while now and usually
> manipulate my data in data frames but i know lists are better for R so
> perhaps the above should be done using lists. Feel free to offer
> suggestions coming from that angle.
>
> Thanks guys
>
> JR
>
>
>
> 
> View this message in context:
> http://r.789695.n4.nabble.com/Summarizebytwoormoreattributestp3529825p3529825.html>
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On May 17, 2011, at 11:48 AM, LCOG1 wrote:
> Okay everyone heres a likely softball for someone.
>
> Consider the following data frame:
>
> #Create data
> x<rep(c(1,15),10)
> y<rnorm(20)
> z<c(rep("auto",10),rep("bus",10))
> a<rep(c(1,1,2,2,3,3,4,4,5,5),2)
> #Create Data frame
> Df<data.frame(Source=x,Rate=y,Bin=a,Type=z)
>
>
> I want to create a new column the equals the sum of the Rates for each type
> (1,15) by Bin.
>
> A related question: I have been using R for a while now and usually
> manipulate my data in data frames but i know lists are better for R so
> perhaps the above should be done using lists. Feel free to offer
> suggestions coming from that angle.
>
> Thanks guys
>
> JR
See ?ave and consider:
# Presuming you want 'Bin' nested within 'Source'
Df$Sum < ave(Df$Rate, list(Df$Source, Df$Bin), FUN = sum)
# Or 'Source' nested within 'Bin'
Df$Sum < ave(Df$Rate, list(Df$Bin, Df$Source), FUN = sum)
On your follow up, a data frame is a type of list with a 'data.frame' class attribute, a 'row.names' attribute and a 'names' attribute for the column names. Much like a matrix is a vector with a 'dim' attribute.
Try this:
unclass(Df)
and see the output. It looks just like a list, because it is...
If dealing with 'rectangular' datasets (eg. a database table), where each column may need to be of differing data types, a data frame in R is specifically designed to handle it. It is because a data frame is a list, that it can do this, since each element in a list can be a different type.
If you need to deal with a data structure that may not be entirely based upon a rectangular data set and may need to contain various numbers of items per element, then a list is the way to go. Lists are commonly used in R functions to return complex objects that may contain vectors of various types, matrices, data frames and even lists of lists.
A quick example would be objects returned by R's model functions. Run example(lm) and after the graphs finish, use str(lm.D9) to give an example of the structure of a somewhat complex list object.
HTH,
Marc Schwartz
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On May 17, 2011, at 12:53 PM, LCOG1 wrote:
> I will hit my own ball on this one
>
>
> tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
>
Aha....you had mentioned creating a new column in your initial post, presumably added to 'Df', as opposed to creating a new independent matrix of the results.
Your output above creates a 5 x 2 matrix of the resultant sums, one column per 'Type' and one row for each 'Bin'.
The use of ave(), now based upon your above:
ave(Df$Rate, list(Df$Bin, Df$Type), FUN = sum)
would yield a vector of length 20, which could then be added to the original 'Df' as a new column. The vector would be ordered in such a fashion as to match up with the original rows, based upon Bin and Type.
I am tempted to quote a famous line from Cool Hand Luke, but I'll leave that for now... :)
Regards,
Marc Schwartz
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Marc,
How could I also apply the spline function to each of the 'columns' found in the result from
tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
??
Original Message
From: Marc Schwartz [mailto: [hidden email]]
Sent: Tuesday, May 17, 2011 12:42 PM
To: ROLL Josh F
Cc: [hidden email]
Subject: Re: [R] Summarize by two or more attributes
On May 17, 2011, at 12:53 PM, LCOG1 wrote:
> I will hit my own ball on this one
>
>
> tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
>
Aha....you had mentioned creating a new column in your initial post, presumably added to 'Df', as opposed to creating a new independent matrix of the results.
Your output above creates a 5 x 2 matrix of the resultant sums, one column per 'Type' and one row for each 'Bin'.
The use of ave(), now based upon your above:
ave(Df$Rate, list(Df$Bin, Df$Type), FUN = sum)
would yield a vector of length 20, which could then be added to the original 'Df' as a new column. The vector would be ordered in such a fashion as to match up with the original rows, based upon Bin and Type.
I am tempted to quote a famous line from Cool Hand Luke, but I'll leave that for now... :)
Regards,
Marc Schwartz
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On May 17, 2011, at 2:55 PM, ROLL Josh F wrote:
> Marc,
> How could I also apply the spline function to each of the 'columns' found in the result from
>
> tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
>
> ??
Something along the lines of the following:
apply(tapply(Df$Rate,list(Df$Bin,Df$Type),sum), 2, spline)
If I am understanding what you want to do.
Depending upon what you are trying to do, you may want to look at the other functions listed in the See Also in ?spline.
HTH,
Marc
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


I will take a look. In my real data I need to interpolate the 16 points into 64 points for each of the categories.
Thanks Marc
JR
Original Message
From: Marc Schwartz [mailto: [hidden email]]
Sent: Tuesday, May 17, 2011 1:09 PM
To: ROLL Josh F
Cc: [hidden email]
Subject: Re: [R] Summarize by two or more attributes
On May 17, 2011, at 2:55 PM, ROLL Josh F wrote:
> Marc,
> How could I also apply the spline function to each of the 'columns' found in the result from
>
> tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
>
> ??
Something along the lines of the following:
apply(tapply(Df$Rate,list(Df$Bin,Df$Type),sum), 2, spline)
If I am understanding what you want to do.
Depending upon what you are trying to do, you may want to look at the other functions listed in the See Also in ?spline.
HTH,
Marc
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

