ddply question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ddply question

Felipe Carrillo
I apologize about cross posting but my question keeps bouncing back from the list
 
 How come pct doesn't work in this ddply call?
I am trying to get a percent of 'TotalCount' by SampleDate and Age
 library(plyr)
b <- structure(list(SampleDate = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "5/8/1996", class = "factor"), TotalCount = c(1L,
2L, 1L, 1L, 4L, 3L, 1L, 10L, 3L), ForkLength = c(61L, 22L, NA,
NA, 72L, 34L, 100L, 23L, 25L), TotalSalvage = c(12L, 24L, 12L,
12L, 17L, 23L, 31L, 12L, 15L), Age = c(1L, 0L, NA, NA, 1L, 0L,
1L, 0L, 0L)), .Names = c("SampleDate", "TotalCount", "ForkLength",
"TotalSalvage", "Age"), class = "data.frame", row.names = c(NA,
-9L))
b
ddply(b,.(SampleDate,Age),summarise,salvage=sum(TotalSalvage),pct=TotalCount/sum(TotalCount))
Error: expecting result of length one, got : 4
 
#Computing TotalCount inside ddply works but the pct seems wrong...
ddply(b,.(SampleDate,Age),summarise,salvage=sum(TotalSalvage),Count=sum(TotalCount),pct=Count/sum(Count))
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ddply question

Brian Diggs-2
On 8/30/2014 2:11 PM, Felipe Carrillo wrote:

>   library(plyr)
> b <- structure(list(SampleDate = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L), .Label = "5/8/1996", class = "factor"), TotalCount = c(1L,
> 2L, 1L, 1L, 4L, 3L, 1L, 10L, 3L), ForkLength = c(61L, 22L, NA,
> NA, 72L, 34L, 100L, 23L, 25L), TotalSalvage = c(12L, 24L, 12L,
> 12L, 17L, 23L, 31L, 12L, 15L), Age = c(1L, 0L, NA, NA, 1L, 0L,
> 1L, 0L, 0L)), .Names = c("SampleDate", "TotalCount", "ForkLength",
> "TotalSalvage", "Age"), class = "data.frame", row.names = c(NA,
> -9L))
> b
> ddply(b,.(SampleDate,Age),summarise,salvage=sum(TotalSalvage),pct=TotalCount/sum(TotalCount))
> Error: expecting result of length one, got : 4

I get a slightly different error:

Error: length(rows) == 1 is not TRUE

but the problem is the same. sum returns a single value, while the
computation for pct returns a vector the same length as TotalCount (the
number of rows in the specific piece of b). summarise is designed to
take a data frame and reduce the number of rows in it by
aggregating/summarizing (some of) the columns. Since your two
computations give different numbers of resulting rows, it errors out. It
seems you don't want to reduce the number of rows, so replace summarise
with mutate. That function can handle the different length return
vectors and recycles appropriately.

(The other difference between summarise and mutate is that mutate keeps
the original columns while summarise drops all original columns and
returns only the computed ones; this makes sense given that summarise
expects to return fewer rows than in the original data.)

> #Computing TotalCount inside ddply works but the pct seems wrong...
> ddply(b,.(SampleDate,Age),summarise,salvage=sum(TotalSalvage),Count=sum(TotalCount),pct=Count/sum(Count))
> [[alternative HTML version deleted]]


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.