getting summary statistics easily with dplyr

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

getting summary statistics easily with dplyr

Christopher W. Ryan
I'm trying to modernize my way of thinking, and my coding, into the
dplyr/tidyverse way of doing things.

To get basic summary statistics on a variable in a dataframe, with the
output also being a dataframe. I previously would do something like this,
using other packages:

library(doBy)
doBy.output <- summaryBy(mpg ~ am, data = mtcars, FUN = fivenum)
str(doBy.output)   ## yes, it's a dataframe
## which I would then incorporate into my report via Sweave and latex
latex(doBy.output, file = "")

## Or this:

library(mosaic)
mosaic.output <- favstats(mpg ~ am, data = mtcars)
str(mosaic.output)  ## yes, it's a dataframe
latex(mosaic.output, file = "")


## What would be the "dplyr way" of doing this?  I know I could specify
each summary statistic individually:

library(dplyr)
dplyr.output <- mtcars %>% group_by(am) %>% summarise(min = min(mpg),
     p25 = quantile(mpg, prob = 0.25),
     p50 = median(mpg),
     p75 = quantile(mpg, prob = 0.75),
     max = max(mpg) )
str(dplyr.output)  ## yes, it's a dataframe
latex(dplyr.output, file = "")

## Is there a way to use a single function like fivenum instead of
specifying each desired summary statistic?  dplyr summarise() wants a
result of length 1, not 5

dplyr.output.2 <- mtcars %>% group_by(am) %>% summarise(fivenum(mpg) )

group_map or group_modify seem like they might do the job, but I could
use some guidance on the syntax.


Thanks.

--Chris Ryan

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.