Summarize by two or more attributes

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Summarize by two or more attributes

Josh R.
Okay everyone heres a likely softball for someone.

Consider the following data frame:

#Create data
x<-rep(c(1,15),10)
y<-rnorm(20)
z<-c(rep("auto",10),rep("bus",10))
a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)


I want to create a new column the equals the sum of the Rates for each type (1,15) by Bin.  

A related question:  I have been using R for a while now and usually manipulate my data in data frames but i know lists are better for R so perhaps the above should be done using lists.  Feel free to offer suggestions coming from that angle.  

Thanks guys

JR-

Reply | Threaded
Open this post in threaded view
|

Re: Summarize by two or more attributes

Josh R.
I will hit my own ball on this one


tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
Reply | Threaded
Open this post in threaded view
|

Re: Summarize by two or more attributes

Abhijit Dasgupta, PhD
In reply to this post by Josh R.
One possibility is:

library(doBy)
summaryBy(Rate~Source+Bin, data=Df, FUN=sum)


On 5/17/2011 12:48 PM, LCOG1 wrote:

> Okay everyone heres a likely softball for someone.
>
> Consider the following data frame:
>
> #Create data
> x<-rep(c(1,15),10)
> y<-rnorm(20)
> z<-c(rep("auto",10),rep("bus",10))
> a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
> #Create Data frame
> Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
>
>
> I want to create a new column the equals the sum of the Rates for each type
> (1,15) by Bin.
>
> A related question:  I have been using R for a while now and usually
> manipulate my data in data frames but i know lists are better for R so
> perhaps the above should be done using lists.  Feel free to offer
> suggestions coming from that angle.
>
> Thanks guys
>
> JR-
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Summarize-by-two-or-more-attributes-tp3529825p3529825.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Summarize by two or more attributes

Felipe Carrillo
In reply to this post by Josh R.
Like This?

x<-rep(c(1,15),10)
y<-rnorm(20)
z<-c(rep("auto",10),rep("bus",10))
a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
#Create Data frame
Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
Df

ddply(Df,c('Type','Bin'),summarise,Summed=sum(Rate))

 # Adding a column to Df
ddply(Df,c('Type','Bin'),mutate,Summed=sum(Rate))
 
# Convert the result to a list
dlply(Df,c('Type','Bin'),summarise,Summed=sum(Rate))


 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx




----- Original Message ----

> From: LCOG1 <[hidden email]>
> To: [hidden email]
> Sent: Tue, May 17, 2011 9:48:36 AM
> Subject: [R] Summarize by two or more attributes
>
> Okay everyone heres a likely softball for someone.
>
> Consider the following data frame:
>
> #Create data
> x<-rep(c(1,15),10)
> y<-rnorm(20)
> z<-c(rep("auto",10),rep("bus",10))
> a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
> #Create Data frame
> Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
>
>
> I want to create a new column the equals the sum of the Rates for each type
> (1,15) by Bin. 
>
> A related question:  I have been using R for a while now and usually
> manipulate my data in data frames but i know lists are better for R so
> perhaps the above should be done using lists.  Feel free to offer
> suggestions coming from that angle. 
>
> Thanks guys
>
> JR-
>
>
>
> --
> View this message in context:
>http://r.789695.n4.nabble.com/Summarize-by-two-or-more-attributes-tp3529825p3529825.html
>
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Summarize by two or more attributes

Marc Schwartz-3
In reply to this post by Josh R.
On May 17, 2011, at 11:48 AM, LCOG1 wrote:

> Okay everyone heres a likely softball for someone.
>
> Consider the following data frame:
>
> #Create data
> x<-rep(c(1,15),10)
> y<-rnorm(20)
> z<-c(rep("auto",10),rep("bus",10))
> a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
> #Create Data frame
> Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
>
>
> I want to create a new column the equals the sum of the Rates for each type
> (1,15) by Bin.  
>
> A related question:  I have been using R for a while now and usually
> manipulate my data in data frames but i know lists are better for R so
> perhaps the above should be done using lists.  Feel free to offer
> suggestions coming from that angle.  
>
> Thanks guys
>
> JR-


See ?ave and consider:

# Presuming you want 'Bin' nested within 'Source'
Df$Sum <- ave(Df$Rate, list(Df$Source, Df$Bin), FUN = sum)

# Or 'Source' nested within 'Bin'
Df$Sum <- ave(Df$Rate, list(Df$Bin, Df$Source), FUN = sum)


On your follow up, a data frame is a type of list with a 'data.frame' class attribute, a 'row.names' attribute and a 'names' attribute for the column names. Much like a matrix is a vector with a 'dim' attribute.

Try this:

  unclass(Df)

and see the output. It looks just like a list, because it is...

If dealing with 'rectangular' datasets (eg. a database table), where each column may need to be of differing data types, a data frame in R is specifically designed to handle it. It is because a data frame is a list, that it can do this, since each element in a list can be a different type.

If you need to deal with a data structure that may not be entirely based upon a rectangular data set and may need to contain various numbers of items per element, then a list is the way to go. Lists are commonly used in R functions to return complex objects that may contain vectors of various types, matrices, data frames and even lists of lists.

A quick example would be objects returned by R's model functions. Run example(lm) and after the graphs finish, use str(lm.D9) to give an example of the structure of a somewhat complex list object.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Summarize by two or more attributes

Marc Schwartz-3
In reply to this post by Josh R.
On May 17, 2011, at 12:53 PM, LCOG1 wrote:

> I will hit my own ball on this one
>
>
> tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
>


Aha....you had mentioned creating a new column in your initial post, presumably added to 'Df', as opposed to creating a new independent matrix of the results.

Your output above creates a 5 x 2 matrix of the resultant sums, one column per 'Type' and one row for each 'Bin'.

The use of ave(), now based upon your above:

  ave(Df$Rate, list(Df$Bin, Df$Type), FUN = sum)

would yield a vector of length 20, which could then be added to the original 'Df' as a new column. The vector would be ordered in such a fashion as to match up with the original rows, based upon Bin and Type.

I am tempted to quote a famous line from Cool Hand Luke, but I'll leave that for now...  :-)

Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Summarize by two or more attributes

Josh R.
Marc,
  How could I also apply the spline function to each of the 'columns' found in the result   from

tapply(Df$Rate,list(Df$Bin,Df$Type),sum)

??




-----Original Message-----
From: Marc Schwartz [mailto:[hidden email]]
Sent: Tuesday, May 17, 2011 12:42 PM
To: ROLL Josh F
Cc: [hidden email]
Subject: Re: [R] Summarize by two or more attributes

On May 17, 2011, at 12:53 PM, LCOG1 wrote:

> I will hit my own ball on this one
>
>
> tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
>


Aha....you had mentioned creating a new column in your initial post, presumably added to 'Df', as opposed to creating a new independent matrix of the results.

Your output above creates a 5 x 2 matrix of the resultant sums, one column per 'Type' and one row for each 'Bin'.

The use of ave(), now based upon your above:

  ave(Df$Rate, list(Df$Bin, Df$Type), FUN = sum)

would yield a vector of length 20, which could then be added to the original 'Df' as a new column. The vector would be ordered in such a fashion as to match up with the original rows, based upon Bin and Type.

I am tempted to quote a famous line from Cool Hand Luke, but I'll leave that for now...  :-)

Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Summarize by two or more attributes

Marc Schwartz-3
On May 17, 2011, at 2:55 PM, ROLL Josh F wrote:

> Marc,
>  How could I also apply the spline function to each of the 'columns' found in the result   from
>
> tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
>
> ??


Something along the lines of the following:

  apply(tapply(Df$Rate,list(Df$Bin,Df$Type),sum), 2, spline)


If I am understanding what you want to do.

Depending upon what you are trying to do, you may want to look at the other functions listed in the See Also in ?spline.

HTH,

Marc

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Summarize by two or more attributes

Josh R.
I will take a look.  In my real data I need to interpolate the 16 points into 64 points for each of the categories.  

Thanks Marc

JR

-----Original Message-----
From: Marc Schwartz [mailto:[hidden email]]
Sent: Tuesday, May 17, 2011 1:09 PM
To: ROLL Josh F
Cc: [hidden email]
Subject: Re: [R] Summarize by two or more attributes

On May 17, 2011, at 2:55 PM, ROLL Josh F wrote:

> Marc,
>  How could I also apply the spline function to each of the 'columns' found in the result   from
>
> tapply(Df$Rate,list(Df$Bin,Df$Type),sum)
>
> ??


Something along the lines of the following:

  apply(tapply(Df$Rate,list(Df$Bin,Df$Type),sum), 2, spline)


If I am understanding what you want to do.

Depending upon what you are trying to do, you may want to look at the other functions listed in the See Also in ?spline.

HTH,

Marc

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.