Quantcast

sum specific rows in a data frame

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

sum specific rows in a data frame

arnaud Gaboury
I have a data frame called "pose":


          DESCRIPTION QUANITY CLOSING.PRICE
1       WHEAT May/10        1        467.75
2       WHEAT May/10        2        467.75
3       WHEAT May/10        1        467.75
4       WHEAT May/10        1        467.75
5 COTTON NO.2 May/10        1         78.13
6 COTTON NO.2 May/10        3         78.13
7 COTTON NO.2 May/10        1         78.13

I would like to sum the quantity for each category (i.e WHEAT and
COTTON),but I have no idea how to write it in a simple manner. The number or
rows will change every day.
TY for any help.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

Mohamed Lajnef
Hi Arnaud,

Try  aggregate function

regards
M

arnaud Gaboury a écrit :

> I have a data frame called "pose":
>
>
>           DESCRIPTION QUANITY CLOSING.PRICE
> 1       WHEAT May/10        1        467.75
> 2       WHEAT May/10        2        467.75
> 3       WHEAT May/10        1        467.75
> 4       WHEAT May/10        1        467.75
> 5 COTTON NO.2 May/10        1         78.13
> 6 COTTON NO.2 May/10        3         78.13
> 7 COTTON NO.2 May/10        1         78.13
>
> I would like to sum the quantity for each category (i.e WHEAT and
> COTTON),but I have no idea how to write it in a simple manner. The number or
> rows will change every day.
> TY for any help.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>  


--


Mohamed Lajnef,IE
INSERM U955 eq 15
Pôle de Psychiatrie
Hôpital CHENEVIER
40, rue Mesly
94010 CRETEIL Cedex FRANCE
[hidden email]
tel : 01 49 81 31 31 (poste 18470)
Sec : 01 49 81 32 90
fax : 01 49 81 30 99

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

arnaud Gaboury
In reply to this post by arnaud Gaboury
Thank you for your help. The best I have found is to use the ddply function.

> pose
          DESCRIPTION QUANITY CLOSING.PRICE
1       WHEAT May/10        1        467.75
2       WHEAT May/10        1        467.75
3       WHEAT May/10        1        467.75
4       WHEAT May/10        1        467.75
5 COTTON NO.2 May/10        1         78.13
6 COTTON NO.2 May/10        1         78.13
7 COTTON NO.2 May/10        1         78.13

> library(plyr)
> op=ddply(pose, c("DESCRIPTION","CLOSING.PRICE"),summarise, POSITION=
sum(QUANITY))
> op
          DESCRIPTION CLOSING.PRICE POSITION
1 COTTON NO.2 May/10          78.13        3
2       WHEAT May/10         467.75        4

Op is a data.frame object.The trick is done!



***************************
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

Chuck-3
Depending on the size of the dataframe and the operations you are
trying to perform, aggregate or ddply may be better.  In the function
below, df has the same structure as your dataframe.

Check out this code which runs aggregate and ddply for different
dataframe sizes.
============================
require(plyr)

CompareAggregation <- function(n) {
    df = data.frame(id=c(rep("A",15*n), rep("B",10*n), rep("C",
20*n)))
    df$fltval = rnorm(nrow(df))
    df$intval = rbinom(nrow(df), 1000, 0.8)
    t1 <- system.time(zz1 <- aggregate(list(fltsum=df$fltval,intsum=df
$intval), list(id=df$id), sum))
    t2 <- system.time(zz2 <- ddply(df, .(id), function(x) c(sum(x
$fltval), sum(x$intval)) ))
    return(c(agg=t1[[1]],ddply=t2[[1]]))
}

z <- c(10^seq(1,5))
names(z) <- as.character(z)
res.df <- t(data.frame(lapply(z, CompareAggregation)))
print(res.df)
============================


On Apr 14, 11:43 am, "arnaud Gaboury" <[hidden email]>
wrote:
> Thank you for your help. The best I have found is to use the ddply function.
>
> > pose

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

hadley wickham
On Thu, Apr 15, 2010 at 1:16 AM, Chuck <[hidden email]> wrote:
> Depending on the size of the dataframe and the operations you are
> trying to perform, aggregate or ddply may be better.  In the function
> below, df has the same structure as your dataframe.

Current version of plyr:

         agg  ddply
X10    0.005  0.007
X100   0.007  0.026
X1000  0.086  0.248
X10000 0.577  3.136
X1e.05 4.493 44.147

Development version of plyr:

         agg ddply
X10    0.003 0.005
X100   0.007 0.007
X1000  0.042 0.044
X10000 0.410 0.443
X1e.05 4.479 4.237

So there are some big speed improvements in the works.

Hadley


--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

Chuck-3
This is very cool...thanks Hadley. When are you planning to release that
version?

On Thu, Apr 15, 2010 at 9:09 AM, hadley wickham <[hidden email]> wrote:

> On Thu, Apr 15, 2010 at 1:16 AM, Chuck <[hidden email]> wrote:
> > Depending on the size of the dataframe and the operations you are
> > trying to perform, aggregate or ddply may be better.  In the function
> > below, df has the same structure as your dataframe.
>
> So there are some big speed improvements in the works.
>
> Hadley
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

Jeff Newmiller
In reply to this post by arnaud Gaboury
This is good news, although I have recently encountered what I consider excessive memory usage in the addition of key columns that don't affect the number of groups.  For example, grouping by Year and Month, if I add MonthBegin, a POSIXct column from which the Year and Month columns were derived, I run out of memory.

hadley wickham <[hidden email]> wrote:

>On Thu, Apr 15, 2010 at 1:16 AM, Chuck <[hidden email]> wrote:
>> Depending on the size of the dataframe and the operations you are
>> trying to perform, aggregate or ddply may be better.  In the function
>> below, df has the same structure as your dataframe.
>
>Current version of plyr:
>
>         agg  ddply
>X10    0.005  0.007
>X100   0.007  0.026
>X1000  0.086  0.248
>X10000 0.577  3.136
>X1e.05 4.493 44.147
>
>Development version of plyr:
>
>         agg ddply
>X10    0.003 0.005
>X100   0.007 0.007
>X1000  0.042 0.044
>X10000 0.410 0.443
>X1e.05 4.479 4.237
>
>So there are some big speed improvements in the works.
>
>Hadley
>
>
>--
>Assistant Professor / Dobelman Family Junior Chair
>Department of Statistics / Rice University
>http://had.co.nz/
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

hadley wickham
I think the development version also fixes that problem, but it's hard
to know without a reproducible example ....

Hadley

On Thu, Apr 15, 2010 at 2:33 PM, Jeff Newmiler <[hidden email]> wrote:

> This is good news, although I have recently encountered what I consider excessive memory usage in the addition of key columns that don't affect the number of groups.  For example, grouping by Year and Month, if I add MonthBegin, a POSIXct column from which the Year and Month columns were derived, I run out of memory.
>
> hadley wickham <[hidden email]> wrote:
>
>>On Thu, Apr 15, 2010 at 1:16 AM, Chuck <[hidden email]> wrote:
>>> Depending on the size of the dataframe and the operations you are
>>> trying to perform, aggregate or ddply may be better.  In the function
>>> below, df has the same structure as your dataframe.
>>
>>Current version of plyr:
>>
>>         agg  ddply
>>X10    0.005  0.007
>>X100   0.007  0.026
>>X1000  0.086  0.248
>>X10000 0.577  3.136
>>X1e.05 4.493 44.147
>>
>>Development version of plyr:
>>
>>         agg ddply
>>X10    0.003 0.005
>>X100   0.007 0.007
>>X1000  0.042 0.044
>>X10000 0.410 0.443
>>X1e.05 4.479 4.237
>>
>>So there are some big speed improvements in the works.
>>
>>Hadley
>>
>>
>>--
>>Assistant Professor / Dobelman Family Junior Chair
>>Department of Statistics / Rice University
>>http://had.co.nz/
>>
>>______________________________________________
>>[hidden email] mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>



--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

hadley wickham
In reply to this post by Chuck-3
The problem is that the new version of plyr is incompatible with
ggplot2, so I need to make some changes there before I can release it.
 Hopefully this summer.

Hadley

On Thu, Apr 15, 2010 at 1:33 PM, Vijay Nori <[hidden email]> wrote:

> This is very cool...thanks Hadley. When are you planning to release that
> version?
>
> On Thu, Apr 15, 2010 at 9:09 AM, hadley wickham <[hidden email]> wrote:
>>
>> On Thu, Apr 15, 2010 at 1:16 AM, Chuck <[hidden email]> wrote:
>> > Depending on the size of the dataframe and the operations you are
>> > trying to perform, aggregate or ddply may be better.  In the function
>> > below, df has the same structure as your dataframe.
>>
>> So there are some big speed improvements in the works.
>>
>> Hadley
>>
>



--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sum specific rows in a data frame

Matthew Dowle
In reply to this post by hadley wickham

Or try data.table 1.4 on r-forge, its grouping is faster than aggregate :

         agg datatable
X10    0.012     0.008
X100   0.020     0.008
X1000  0.172     0.020
X10000 1.164     0.144
X1e.05 9.397     1.180

install.packages("data.table", repos="http://R-Forge.R-project.org")
require(data.table)
dt = as.data.table(df)
t3 <- system.time(zz3 <- dt[, list(sumflt=sum(fltval), sumint=sum
(intval)), by=id])

Matthew


On Thu, 15 Apr 2010 13:09:17 +0000, hadley wickham wrote:

> On Thu, Apr 15, 2010 at 1:16 AM, Chuck <[hidden email]> wrote:
>> Depending on the size of the dataframe and the operations you are
>> trying to perform, aggregate or ddply may be better.  In the function
>> below, df has the same structure as your dataframe.
>
> Current version of plyr:
>
>          agg  ddply
> X10    0.005  0.007
> X100   0.007  0.026
> X1000  0.086  0.248
> X10000 0.577  3.136
> X1e.05 4.493 44.147
>
> Development version of plyr:
>
>          agg ddply
> X10    0.003 0.005
> X100   0.007 0.007
> X1000  0.042 0.044
> X10000 0.410 0.443
> X1e.05 4.479 4.237
>
> So there are some big speed improvements in the works.
>
> Hadley

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...