Avoid duplication in dplyr::summarise

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoid duplication in dplyr::summarise

Lars Bishop-2
Dear group,

Is there a way I could avoid the sort of duplication illustrated below?
i.e., I have the same dplyr::summarise function on different group_by
arguments. So I'd like to create a single summarise function that could be
applied to both. My attempt below fails.

df <- data.frame(matrix(rnorm(40), 10, 4),
                 f1 = gl(3, 10, labels = letters[1:3]),
                 f2 = gl(3, 10, labels = letters[4:6]))


df %>%
  group_by(f1, f2) %>%
  summarise(x1m = mean(X1),
            x2m = mean(X2),
            x3m = mean(X3),
            x4m = mean(X4))

df %>%
  group_by(f1) %>%
  summarise(x1m = mean(X1),
            x2m = mean(X2),
            x3m = mean(X3),
            x4m = mean(X4))

# My fail attempt

s <- function() {
  dplyr::summarise(x1m = mean(X1),
                   x2m = mean(X2),
                   x3m = mean(X3),
                   x4m = mean(X4))
}

df %>%
  group_by(f1) %>% s
Error in s(.) : unused argument (.)

Regards,
Lars.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Avoid duplication in dplyr::summarise

Edjabou Vincent
Hi Lars

I am not very sure what you really want. However, I am suggesting the
following code that enables (1) to obtain the full summary of your data and
(2) retrieve only mean of X values as function of factors f1 and f2.

library(tidyverse)
library(psych)
df <- data.frame(matrix(rnorm(40), 10, 4),
                 f1 = gl(3, 10, labels = letters[1:3]),
                 f2 = gl(3, 10, labels = letters[4:6]))

##To get all summary of your data
df%>% gather(X_name,X_value,X1:X4)%>%
  group_by(f1,f2,X_name)%>%
  do(describe(.$X_value))

##To obtain only means of your data
df%>% gather(X_name,X_value,X1:X4)%>%
  group_by(f1,f2,X_name)%>%
  do(describe(.$X_value))%>%
  select(mean)%>%# You select only mean value
  spread(X_name,mean)#

Vincent

Med venlig hilsen/ Best regards

Edjabou Maklawe Essonanawe Vincent
Mobile: +45 31 95 99 33

On Sat, Sep 9, 2017 at 12:30 PM, Lars Bishop <[hidden email]> wrote:

> Dear group,
>
> Is there a way I could avoid the sort of duplication illustrated below?
> i.e., I have the same dplyr::summarise function on different group_by
> arguments. So I'd like to create a single summarise function that could be
> applied to both. My attempt below fails.
>
> df <- data.frame(matrix(rnorm(40), 10, 4),
>                  f1 = gl(3, 10, labels = letters[1:3]),
>                  f2 = gl(3, 10, labels = letters[4:6]))
>
>
> df %>%
>   group_by(f1, f2) %>%
>   summarise(x1m = mean(X1),
>             x2m = mean(X2),
>             x3m = mean(X3),
>             x4m = mean(X4))
>
> df %>%
>   group_by(f1) %>%
>   summarise(x1m = mean(X1),
>             x2m = mean(X2),
>             x3m = mean(X3),
>             x4m = mean(X4))
>
> # My fail attempt
>
> s <- function() {
>   dplyr::summarise(x1m = mean(X1),
>                    x2m = mean(X2),
>                    x3m = mean(X3),
>                    x4m = mean(X4))
> }
>
> df %>%
>   group_by(f1) %>% s
> Error in s(.) : unused argument (.)
>
> Regards,
> Lars.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Avoid duplication in dplyr::summarise

Eric Berger
Hi Lars,
Two comments:
1. You can achieve what you want with a slight modification of your
definition of s(), using the hint from the error message that you need an
argument '.':
s <- function(.) {
  dplyr::summarise(., x1m = mean(X1),
                   x2m = mean(X2),
                   x3m = mean(X3),
                   x4m = mean(X4))
}

2. You have not given a great test case in how you set your two factors
because the two group_by()'s will give the identical groupings, An
alternative which confirms that the function s() does what you want might
be:

df <- data.frame(matrix(rnorm(40), 10, 4),
                 f1 = base::sample(letters[1:3],30,replace=TRUE),
                 f2 = base::sample(letters[4:6],30,replace=TRUE))

HTH,

Eric

On Sat, Sep 9, 2017 at 1:52 PM, Edjabou Vincent <[hidden email]> wrote:

> Hi Lars
>
> I am not very sure what you really want. However, I am suggesting the
> following code that enables (1) to obtain the full summary of your data and
> (2) retrieve only mean of X values as function of factors f1 and f2.
>
> library(tidyverse)
> library(psych)
> df <- data.frame(matrix(rnorm(40), 10, 4),
>                  f1 = gl(3, 10, labels = letters[1:3]),
>                  f2 = gl(3, 10, labels = letters[4:6]))
>
> ##To get all summary of your data
> df%>% gather(X_name,X_value,X1:X4)%>%
>   group_by(f1,f2,X_name)%>%
>   do(describe(.$X_value))
>
> ##To obtain only means of your data
> df%>% gather(X_name,X_value,X1:X4)%>%
>   group_by(f1,f2,X_name)%>%
>   do(describe(.$X_value))%>%
>   select(mean)%>%# You select only mean value
>   spread(X_name,mean)#
>
> Vincent
>
> Med venlig hilsen/ Best regards
>
> Edjabou Maklawe Essonanawe Vincent
> Mobile: +45 31 95 99 33
>
> On Sat, Sep 9, 2017 at 12:30 PM, Lars Bishop <[hidden email]> wrote:
>
> > Dear group,
> >
> > Is there a way I could avoid the sort of duplication illustrated below?
> > i.e., I have the same dplyr::summarise function on different group_by
> > arguments. So I'd like to create a single summarise function that could
> be
> > applied to both. My attempt below fails.
> >
> > df <- data.frame(matrix(rnorm(40), 10, 4),
> >                  f1 = gl(3, 10, labels = letters[1:3]),
> >                  f2 = gl(3, 10, labels = letters[4:6]))
> >
> >
> > df %>%
> >   group_by(f1, f2) %>%
> >   summarise(x1m = mean(X1),
> >             x2m = mean(X2),
> >             x3m = mean(X3),
> >             x4m = mean(X4))
> >
> > df %>%
> >   group_by(f1) %>%
> >   summarise(x1m = mean(X1),
> >             x2m = mean(X2),
> >             x3m = mean(X3),
> >             x4m = mean(X4))
> >
> > # My fail attempt
> >
> > s <- function() {
> >   dplyr::summarise(x1m = mean(X1),
> >                    x2m = mean(X2),
> >                    x3m = mean(X3),
> >                    x4m = mean(X4))
> > }
> >
> > df %>%
> >   group_by(f1) %>% s
> > Error in s(.) : unused argument (.)
> >
> > Regards,
> > Lars.
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Avoid duplication in dplyr::summarise

Lars Bishop-2
Exactly what I was looking for Eric, thanks!

I agree on your second point.

Best,
Lars.

On Sat, Sep 9, 2017 at 9:02 AM, Eric Berger <[hidden email]> wrote:

> Hi Lars,
> Two comments:
> 1. You can achieve what you want with a slight modification of your
> definition of s(), using the hint from the error message that you need an
> argument '.':
> s <- function(.) {
>   dplyr::summarise(., x1m = mean(X1),
>                    x2m = mean(X2),
>                    x3m = mean(X3),
>                    x4m = mean(X4))
> }
>
> 2. You have not given a great test case in how you set your two factors
> because the two group_by()'s will give the identical groupings, An
> alternative which confirms that the function s() does what you want might
> be:
>
> df <- data.frame(matrix(rnorm(40), 10, 4),
>                  f1 = base::sample(letters[1:3],30,replace=TRUE),
>                  f2 = base::sample(letters[4:6],30,replace=TRUE))
>
> HTH,
>
> Eric
>
> On Sat, Sep 9, 2017 at 1:52 PM, Edjabou Vincent <[hidden email]> wrote:
>
>> Hi Lars
>>
>> I am not very sure what you really want. However, I am suggesting the
>> following code that enables (1) to obtain the full summary of your data
>> and
>> (2) retrieve only mean of X values as function of factors f1 and f2.
>>
>> library(tidyverse)
>> library(psych)
>> df <- data.frame(matrix(rnorm(40), 10, 4),
>>                  f1 = gl(3, 10, labels = letters[1:3]),
>>                  f2 = gl(3, 10, labels = letters[4:6]))
>>
>> ##To get all summary of your data
>> df%>% gather(X_name,X_value,X1:X4)%>%
>>   group_by(f1,f2,X_name)%>%
>>   do(describe(.$X_value))
>>
>> ##To obtain only means of your data
>> df%>% gather(X_name,X_value,X1:X4)%>%
>>   group_by(f1,f2,X_name)%>%
>>   do(describe(.$X_value))%>%
>>   select(mean)%>%# You select only mean value
>>   spread(X_name,mean)#
>>
>> Vincent
>>
>> Med venlig hilsen/ Best regards
>>
>> Edjabou Maklawe Essonanawe Vincent
>> Mobile: +45 31 95 99 33
>>
>> On Sat, Sep 9, 2017 at 12:30 PM, Lars Bishop <[hidden email]> wrote:
>>
>> > Dear group,
>> >
>> > Is there a way I could avoid the sort of duplication illustrated below?
>> > i.e., I have the same dplyr::summarise function on different group_by
>> > arguments. So I'd like to create a single summarise function that could
>> be
>> > applied to both. My attempt below fails.
>> >
>> > df <- data.frame(matrix(rnorm(40), 10, 4),
>> >                  f1 = gl(3, 10, labels = letters[1:3]),
>> >                  f2 = gl(3, 10, labels = letters[4:6]))
>> >
>> >
>> > df %>%
>> >   group_by(f1, f2) %>%
>> >   summarise(x1m = mean(X1),
>> >             x2m = mean(X2),
>> >             x3m = mean(X3),
>> >             x4m = mean(X4))
>> >
>> > df %>%
>> >   group_by(f1) %>%
>> >   summarise(x1m = mean(X1),
>> >             x2m = mean(X2),
>> >             x3m = mean(X3),
>> >             x4m = mean(X4))
>> >
>> > # My fail attempt
>> >
>> > s <- function() {
>> >   dplyr::summarise(x1m = mean(X1),
>> >                    x2m = mean(X2),
>> >                    x3m = mean(X3),
>> >                    x4m = mean(X4))
>> > }
>> >
>> > df %>%
>> >   group_by(f1) %>% s
>> > Error in s(.) : unused argument (.)
>> >
>> > Regards,
>> > Lars.
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.