Sum by column

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Sum by column

peterfrancis
Dear List,

I have a question of convenience,

I am looking to sum the values of one column based on another column - a example may help explain better!

ED ECOCODE
21.809467 AA0101
36.229566 PA1201
51.861284 PA1201
11.36232 PA1201
27.264634 PA1201
12.261986 PA1201
46.519313 PA1201
7.815376 PA1201
2.810428 PA1201
13.478372 PA1201
35.670182 PA1301
27.128715 AT0801
19.010294 AT1201
15.475368 AT1201
18.597983 AT0101
29.292615 AT0101
6.749846 AT0101
14.981488 AT0101
14.93511 AT0101
14.93511 AT0101
21.040785 AT0101
8.271615 AT0101
12.94232 AT0101
6.749846 AT0101
15.484412 AT0101
29.644494 AT0101
43.211212 AT0101

So for AA0101 it would be = 21.809467
            AT1201 it would be = 19.010294+15.475368

etc

I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other.

This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance!

Thanks

Peter

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Sum by column

Joshua Wiley-2
Hi Peter,

R has some fairly flexible ways of passing values of some variable (X)
by another (the INDEX) to different FUNctions.  Here is an example
using your data:

## your email data, in convenient form
dat <- structure(list(ED = c(21.809467, 36.229566, 51.861284, 11.36232,
27.264634, 12.261986, 46.519313, 7.815376, 2.810428, 13.478372,
35.670182, 27.128715, 19.010294, 15.475368, 18.597983, 29.292615,
6.749846, 14.981488, 14.93511, 14.93511, 21.040785, 8.271615,
12.94232, 6.749846, 15.484412, 29.644494, 43.211212), ECOCODE = structure(c(1L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 3L, 4L, 4L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AA0101",
"AT0101", "AT0801", "AT1201", "PA1201", "PA1301"), class = "factor")),
.Names = c("ED",
"ECOCODE"), class = "data.frame", row.names = c(NA, -27L))
## look at the structure of the data
str(dat)

## inside of "dat" (to avoid typing its name repeatedly)
## find the sum of ED at each level of ECOCODE
with(dat, tapply(X = ED, INDEX = ECOCODE, FUN = sum, na.rm = TRUE))

## should give something like
   AA0101    AT0101    AT0801    AT1201    PA1201    PA1301
 21.80947 236.83684  27.12871  34.48566 209.60328  35.67018

For documentation, look at:

?tapply
## similar in many ways though sometimes slightly more/less convenient
?by

Hope that helps,

Josh

On Wed, Jan 12, 2011 at 2:38 AM, Peter Francis <[hidden email]> wrote:

> Dear List,
>
> I have a question of convenience,
>
> I am looking to sum the values of one column based on another column - a example may help explain better!
>
> ED                      ECOCODE
> 21.809467       AA0101
> 36.229566       PA1201
> 51.861284       PA1201
> 11.36232        PA1201
> 27.264634       PA1201
> 12.261986       PA1201
> 46.519313       PA1201
> 7.815376        PA1201
> 2.810428        PA1201
> 13.478372       PA1201
> 35.670182       PA1301
> 27.128715       AT0801
> 19.010294       AT1201
> 15.475368       AT1201
> 18.597983       AT0101
> 29.292615       AT0101
> 6.749846        AT0101
> 14.981488       AT0101
> 14.93511        AT0101
> 14.93511        AT0101
> 21.040785       AT0101
> 8.271615        AT0101
> 12.94232        AT0101
> 6.749846        AT0101
> 15.484412       AT0101
> 29.644494       AT0101
> 43.211212       AT0101
>
> So for AA0101 it would be = 21.809467
>            AT1201 it would be = 19.010294+15.475368
>
> etc
>
> I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other.
>
> This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance!
>
> Thanks
>
> Peter
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Sum by column

David Winsemius
In reply to this post by peterfrancis
There are two functions you need to become familiar with:

?tapply
?ave

If you wanted these summed values to be placed in another column of  
the same dataframe, you would use ave. If you wanted a new structure  
(somewhat shorter) you would use tapply with sum as the function. E. g:

tapply(ecoregion$ED, ecoregion$ECOCODE, sum)

--
David.

On Jan 12, 2011, at 5:38 AM, Peter Francis wrote:

> Dear List,
>
> I have a question of convenience,
>
> I am looking to sum the values of one column based on another column  
> - a example may help explain better!
>
> ED ECOCODE
> 21.809467 AA0101
> 36.229566 PA1201
> 51.861284 PA1201
> 11.36232 PA1201
> 27.264634 PA1201
> 12.261986 PA1201
> 46.519313 PA1201
> 7.815376 PA1201
> 2.810428 PA1201
> 13.478372 PA1201
> 35.670182 PA1301
> 27.128715 AT0801
> 19.010294 AT1201
> 15.475368 AT1201
> 18.597983 AT0101
> 29.292615 AT0101
> 6.749846 AT0101
> 14.981488 AT0101
> 14.93511 AT0101
> 14.93511 AT0101
> 21.040785 AT0101
> 8.271615 AT0101
> 12.94232 AT0101
> 6.749846 AT0101
> 15.484412 AT0101
> 29.644494 AT0101
> 43.211212 AT0101
>
> So for AA0101 it would be = 21.809467
>    AT1201 it would be = 19.010294+15.475368
>
> etc
>
> I would then like to be able to output a table with ECOCODE in one  
> column and the sum of ED in the other.
>
> This is stored in a dataframe called ecoregion, i understand people  
> like having code to change but i have none as i am a relative  
> beginner! Sorry in advance!
>
> Thanks
>
> Peter
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Sum by column

peterfrancis
David and Josh,

Thanks very much for your help, it is much appreciated.

Peter


On 12 Jan 2011, at 14:28, David Winsemius wrote:

There are two functions you need to become familiar with:

?tapply
?ave

If you wanted these summed values to be placed in another column of the same dataframe, you would use ave. If you wanted a new structure (somewhat shorter) you would use tapply with sum as the function. E. g:

tapply(ecoregion$ED, ecoregion$ECOCODE, sum)

--
David.

On Jan 12, 2011, at 5:38 AM, Peter Francis wrote:

> Dear List,
>
> I have a question of convenience,
>
> I am looking to sum the values of one column based on another column - a example may help explain better!
>
> ED ECOCODE
> 21.809467 AA0101
> 36.229566 PA1201
> 51.861284 PA1201
> 11.36232 PA1201
> 27.264634 PA1201
> 12.261986 PA1201
> 46.519313 PA1201
> 7.815376 PA1201
> 2.810428 PA1201
> 13.478372 PA1201
> 35.670182 PA1301
> 27.128715 AT0801
> 19.010294 AT1201
> 15.475368 AT1201
> 18.597983 AT0101
> 29.292615 AT0101
> 6.749846 AT0101
> 14.981488 AT0101
> 14.93511 AT0101
> 14.93511 AT0101
> 21.040785 AT0101
> 8.271615 AT0101
> 12.94232 AT0101
> 6.749846 AT0101
> 15.484412 AT0101
> 29.644494 AT0101
> 43.211212 AT0101
>
> So for AA0101 it would be = 21.809467
>    AT1201 it would be = 19.010294+15.475368
>
> etc
>
> I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other.
>
> This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance!
>
> Thanks
>
> Peter
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Sum by column

Felipe Carrillo
In reply to this post by peterfrancis
Or with ddply :
library(plyr)
dat <- structure(list(ED = c(21.809467, 36.229566, 51.861284, 11.36232,
27.264634, 12.261986, 46.519313, 7.815376, 2.810428, 13.478372,
35.670182, 27.128715, 19.010294, 15.475368, 18.597983, 29.292615,
6.749846, 14.981488, 14.93511, 14.93511, 21.040785, 8.271615,
12.94232, 6.749846, 15.484412, 29.644494, 43.211212), ECOCODE = structure(c(1L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 3L, 4L, 4L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AA0101",
"AT0101", "AT0801", "AT1201", "PA1201", "PA1301"), class = "factor")),
.Names = c("ED",
"ECOCODE"), class = "data.frame", row.names = c(NA, -27L))
dat
ddply(dat,"ECOCODE",summarise,EDsummed=sum(ED))
 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx




----- Original Message ----

> From: Peter Francis <[hidden email]>
> To: [hidden email]
> Sent: Wed, January 12, 2011 2:38:19 AM
> Subject: [R] Sum by column
>
> Dear List,
>
> I have a question of convenience,
>
> I am looking to sum the values of one column based on another column - a
>example may help explain better!
>
> ED            ECOCODE
> 21.809467    AA0101
> 36.229566    PA1201
> 51.861284    PA1201
> 11.36232    PA1201
> 27.264634    PA1201
> 12.261986    PA1201
> 46.519313    PA1201
> 7.815376    PA1201
> 2.810428    PA1201
> 13.478372    PA1201
> 35.670182    PA1301
> 27.128715    AT0801
> 19.010294    AT1201
> 15.475368    AT1201
> 18.597983    AT0101
> 29.292615    AT0101
> 6.749846    AT0101
> 14.981488    AT0101
> 14.93511    AT0101
> 14.93511    AT0101
> 21.040785    AT0101
> 8.271615    AT0101
> 12.94232    AT0101
> 6.749846    AT0101
> 15.484412    AT0101
> 29.644494    AT0101
> 43.211212    AT0101
>
> So for AA0101 it would be = 21.809467
>         AT1201 it would be = 19.010294+15.475368
>
> etc
>
> I would then like to be able to output a table with ECOCODE in one column and
>the sum of ED in the other.
>
> This is stored in a dataframe called ecoregion, i understand people like having
>code to change but i have none as i am a relative beginner! Sorry in advance!
>
> Thanks
>
> Peter
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.