Can I pass the grouped portions of a dataframe/tibble to a function in dplyr

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Can I pass the grouped portions of a dataframe/tibble to a function in dplyr

Chris Evans
Apologies if this is a stupid question but searching keeps getting things I know and don't need.

What I want to do is to use the group-by() power of dplyr to run functions that expect a dataframe/tibble per group but I can't see how do it. Here is a reproducible example.

### create trivial tibble
n <- 50
x <- 1:n
y <- sample(1:3, n, replace = TRUE)
z <- rnorm(n)
tib <- as_tibble(cbind(x,y,z))

### create trivial function that expects a tibble/data frame
sillyFun <- function(tib){
return(list(nrow = nrow(tib),
ncol = ncol(tib)))
}

### works fine on the whole tibble
tib %>%
summarise(dim = list(sillyFun(.))) %>%
unnest_wider(dim)

That gives me:
# A tibble: 1 x 2
   nrow  ncol
  <int> <int>
1    50     3


### So I try the following hoping to apply the function to the grouped tibble
tib %>%
group_by(y) %>%
summarise(dim = list(sillyFun(.))) %>%
unnest_wider(dim)

### But that gives me:
# A tibble: 3 x 3
      y  nrow  ncol
  <dbl> <int> <int>
1     1    50     3
2     2    50     3
3     3    50     3

Clearly "." is still passing the whole tibble, not the grouped subsets.  What I can't find is whether there is an alternative to "." that would pass just the grouped subset of the tibble.

I have bodged my way around this by writing a function that takes individual columns and reassembles them into a data frame that the actual functions I need to use require but that takes me back to a lot of clumsiness both selecting the variables to pass in the dplyr call to the function and putting the reassemble-to-data-frame bit in the function I call.  (The functions I really need are reliability explorations and can called on whole dataframes.)

I know I can do this using base R split and lapply but I feel sure it must be possible to do this within dplyr/tidyverse.  I'm slowly transferring most of my code to the tidyverse and hitting frustrations but also finding that it does really help me program more sensibly, handle relational data structures more easily, and write code that I seem better at reading when I come back to it after months on other things so I am slowly trying to move all my coding to tidyverse.  If I could see how to do this, it would help.

Very sorry if the answer should be blindingly obvious to me.  I'd also love to have pointers to guidance to the tidyverse written for people who aren't professional coders or statisticians and that go a bit beyond the obvious basics of tidyverse into issues like this.

TIA,

Chris

--
Small contribution in our coronavirus rigours:
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/

Chris Evans <[hidden email]> Visiting Professor, University of Sheffield <[hidden email]>
I do some consultation work for the University of Roehampton <[hidden email]> and other places
but <[hidden email]> remains my main Email address.  I have a work web site at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
   https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/

If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Can I pass the grouped portions of a dataframe/tibble to a function in dplyr

Rui Barradas
Hello,

You can pass a grouped tibble to a function with grouped_modify but the
function must return a data.frame (or similar).

## this will also do it
#sillyFun <- function(tib){
#  tibble(nrow = nrow(tib), ncol = ncol(tib))
#}


sillyFun <- function(tib){
   data.frame(nrow = nrow(tib), ncol = ncol(tib)))
}

tib %>%
   group_by(y) %>%
   group_modify(~ sillyFun(.))
## A tibble: 3 x 3
## Groups:   y [3]
#      y  nrow  ncol
#  <dbl> <int> <int>
#1     1    17     2
#2     2    21     2
#3     3    12     2


Hope this helps,

Rui Barradas

Às 09:43 de 05/07/2020, Chris Evans escreveu:

> Apologies if this is a stupid question but searching keeps getting things I know and don't need.
>
> What I want to do is to use the group-by() power of dplyr to run functions that expect a dataframe/tibble per group but I can't see how do it. Here is a reproducible example.
>
> ### create trivial tibble
> n <- 50
> x <- 1:n
> y <- sample(1:3, n, replace = TRUE)
> z <- rnorm(n)
> tib <- as_tibble(cbind(x,y,z))
>
> ### create trivial function that expects a tibble/data frame
> sillyFun <- function(tib){
> return(list(nrow = nrow(tib),
> ncol = ncol(tib)))
> }
>
> ### works fine on the whole tibble
> tib %>%
> summarise(dim = list(sillyFun(.))) %>%
> unnest_wider(dim)
>
> That gives me:
> # A tibble: 1 x 2
>     nrow  ncol
>    <int> <int>
> 1    50     3
>
>
> ### So I try the following hoping to apply the function to the grouped tibble
> tib %>%
> group_by(y) %>%
> summarise(dim = list(sillyFun(.))) %>%
> unnest_wider(dim)
>
> ### But that gives me:
> # A tibble: 3 x 3
>        y  nrow  ncol
>    <dbl> <int> <int>
> 1     1    50     3
> 2     2    50     3
> 3     3    50     3
>
> Clearly "." is still passing the whole tibble, not the grouped subsets.  What I can't find is whether there is an alternative to "." that would pass just the grouped subset of the tibble.
>
> I have bodged my way around this by writing a function that takes individual columns and reassembles them into a data frame that the actual functions I need to use require but that takes me back to a lot of clumsiness both selecting the variables to pass in the dplyr call to the function and putting the reassemble-to-data-frame bit in the function I call.  (The functions I really need are reliability explorations and can called on whole dataframes.)
>
> I know I can do this using base R split and lapply but I feel sure it must be possible to do this within dplyr/tidyverse.  I'm slowly transferring most of my code to the tidyverse and hitting frustrations but also finding that it does really help me program more sensibly, handle relational data structures more easily, and write code that I seem better at reading when I come back to it after months on other things so I am slowly trying to move all my coding to tidyverse.  If I could see how to do this, it would help.
>
> Very sorry if the answer should be blindingly obvious to me.  I'd also love to have pointers to guidance to the tidyverse written for people who aren't professional coders or statisticians and that go a bit beyond the obvious basics of tidyverse into issues like this.
>
> TIA,
>
> Chris
>

--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Can I pass the grouped portions of a dataframe/tibble to a function in dplyr

Rui Barradas
Hello,

I forgot to say I redid the data set setting the RNG seed first.



set.seed(2020)
n <- 50
x <- 1:n
y <- sample(1:3, n, replace = TRUE)
z <- rnorm(n)
tib <- tibble(x,y,z)


Also, don't do

as_tibble(cbind(...))
as.data.frame(cbind(...))


If one of the variables is of a different class (example, "character")
all variables are coerced to the least common denominator. It's much
better to call tibble() or data.frame() directly.

Hope this helps,

Rui Barradas


Às 12:04 de 05/07/2020, Rui Barradas escreveu:

> Hello,
>
> You can pass a grouped tibble to a function with grouped_modify but the
> function must return a data.frame (or similar).
>
> ## this will also do it
> #sillyFun <- function(tib){
> #  tibble(nrow = nrow(tib), ncol = ncol(tib))
> #}
>
>
> sillyFun <- function(tib){
>    data.frame(nrow = nrow(tib), ncol = ncol(tib)))
> }
>
> tib %>%
>    group_by(y) %>%
>    group_modify(~ sillyFun(.))
> ## A tibble: 3 x 3
> ## Groups:   y [3]
> #      y  nrow  ncol
> #  <dbl> <int> <int>
> #1     1    17     2
> #2     2    21     2
> #3     3    12     2
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 09:43 de 05/07/2020, Chris Evans escreveu:
>> Apologies if this is a stupid question but searching keeps getting
>> things I know and don't need.
>>
>> What I want to do is to use the group-by() power of dplyr to run
>> functions that expect a dataframe/tibble per group but I can't see how
>> do it. Here is a reproducible example.
>>
>> ### create trivial tibble
>> n <- 50
>> x <- 1:n
>> y <- sample(1:3, n, replace = TRUE)
>> z <- rnorm(n)
>> tib <- as_tibble(cbind(x,y,z))
>>
>> ### create trivial function that expects a tibble/data frame
>> sillyFun <- function(tib){
>> return(list(nrow = nrow(tib),
>> ncol = ncol(tib)))
>> }
>>
>> ### works fine on the whole tibble
>> tib %>%
>> summarise(dim = list(sillyFun(.))) %>%
>> unnest_wider(dim)
>>
>> That gives me:
>> # A tibble: 1 x 2
>>     nrow  ncol
>>    <int> <int>
>> 1    50     3
>>
>>
>> ### So I try the following hoping to apply the function to the grouped
>> tibble
>> tib %>%
>> group_by(y) %>%
>> summarise(dim = list(sillyFun(.))) %>%
>> unnest_wider(dim)
>>
>> ### But that gives me:
>> # A tibble: 3 x 3
>>        y  nrow  ncol
>>    <dbl> <int> <int>
>> 1     1    50     3
>> 2     2    50     3
>> 3     3    50     3
>>
>> Clearly "." is still passing the whole tibble, not the grouped
>> subsets.  What I can't find is whether there is an alternative to "."
>> that would pass just the grouped subset of the tibble.
>>
>> I have bodged my way around this by writing a function that takes
>> individual columns and reassembles them into a data frame that the
>> actual functions I need to use require but that takes me back to a lot
>> of clumsiness both selecting the variables to pass in the dplyr call
>> to the function and putting the reassemble-to-data-frame bit in the
>> function I call.  (The functions I really need are reliability
>> explorations and can called on whole dataframes.)
>>
>> I know I can do this using base R split and lapply but I feel sure it
>> must be possible to do this within dplyr/tidyverse.  I'm slowly
>> transferring most of my code to the tidyverse and hitting frustrations
>> but also finding that it does really help me program more sensibly,
>> handle relational data structures more easily, and write code that I
>> seem better at reading when I come back to it after months on other
>> things so I am slowly trying to move all my coding to tidyverse.  If I
>> could see how to do this, it would help.
>>
>> Very sorry if the answer should be blindingly obvious to me.  I'd also
>> love to have pointers to guidance to the tidyverse written for people
>> who aren't professional coders or statisticians and that go a bit
>> beyond the obvious basics of tidyverse into issues like this.
>>
>> TIA,
>>
>> Chris
>>
>

--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Can I pass the grouped portions of a dataframe/tibble to a function in dplyr

Chris Evans
Ouch.  I should have know all those points Rui: my bad.  Casual behaviour while just rushing up a little example. Good to be reminded.

group_modify()  is clearly exactly what I wanted and I will experiment with it and make sure I understand it properly.  I see from the help that it evolves from, or supercedes aspects of do() which I think must have been the function I had forgotten.  Even more interestingly I see that it seems to lead me into interesting options and experimental developments in tidyverse that I didn't know.

Excellent.  Perfect help ... many thanks!

Chris

----- Original Message -----
> From: "Rui Barradas" <[hidden email]>
> To: "Chris Evans" <[hidden email]>, "R-help" <[hidden email]>
> Sent: Sunday, 5 July, 2020 13:16:19
> Subject: Re: [R] Can I pass the grouped portions of a dataframe/tibble to a function in dplyr

> Hello,
>
> I forgot to say I redid the data set setting the RNG seed first.
>
>
>
> set.seed(2020)
> n <- 50
> x <- 1:n
> y <- sample(1:3, n, replace = TRUE)
> z <- rnorm(n)
> tib <- tibble(x,y,z)
>
>
> Also, don't do
>
> as_tibble(cbind(...))
> as.data.frame(cbind(...))
>
>
> If one of the variables is of a different class (example, "character")
> all variables are coerced to the least common denominator. It's much
> better to call tibble() or data.frame() directly.
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 12:04 de 05/07/2020, Rui Barradas escreveu:
>> Hello,
>>
>> You can pass a grouped tibble to a function with grouped_modify but the
>> function must return a data.frame (or similar).
>>
>> ## this will also do it
>> #sillyFun <- function(tib){
>> #  tibble(nrow = nrow(tib), ncol = ncol(tib))
>> #}
>>
>>
>> sillyFun <- function(tib){
>>    data.frame(nrow = nrow(tib), ncol = ncol(tib)))
>> }
>>
>> tib %>%
>>    group_by(y) %>%
>>    group_modify(~ sillyFun(.))
>> ## A tibble: 3 x 3
>> ## Groups:   y [3]
>> #      y  nrow  ncol
>> #  <dbl> <int> <int>
>> #1     1    17     2
>> #2     2    21     2
>> #3     3    12     2
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Às 09:43 de 05/07/2020, Chris Evans escreveu:
>>> Apologies if this is a stupid question but searching keeps getting
>>> things I know and don't need.
>>>
>>> What I want to do is to use the group-by() power of dplyr to run
>>> functions that expect a dataframe/tibble per group but I can't see how
>>> do it. Here is a reproducible example.
>>>
>>> ### create trivial tibble
>>> n <- 50
>>> x <- 1:n
>>> y <- sample(1:3, n, replace = TRUE)
>>> z <- rnorm(n)
>>> tib <- as_tibble(cbind(x,y,z))
>>>
>>> ### create trivial function that expects a tibble/data frame
>>> sillyFun <- function(tib){
>>> return(list(nrow = nrow(tib),
>>> ncol = ncol(tib)))
>>> }
>>>
>>> ### works fine on the whole tibble
>>> tib %>%
>>> summarise(dim = list(sillyFun(.))) %>%
>>> unnest_wider(dim)
>>>
>>> That gives me:
>>> # A tibble: 1 x 2
>>>     nrow  ncol
>>>    <int> <int>
>>> 1    50     3
>>>
>>>
>>> ### So I try the following hoping to apply the function to the grouped
>>> tibble
>>> tib %>%
>>> group_by(y) %>%
>>> summarise(dim = list(sillyFun(.))) %>%
>>> unnest_wider(dim)
>>>
>>> ### But that gives me:
>>> # A tibble: 3 x 3
>>>        y  nrow  ncol
>>>    <dbl> <int> <int>
>>> 1     1    50     3
>>> 2     2    50     3
>>> 3     3    50     3
>>>
>>> Clearly "." is still passing the whole tibble, not the grouped
>>> subsets.  What I can't find is whether there is an alternative to "."
>>> that would pass just the grouped subset of the tibble.
>>>
>>> I have bodged my way around this by writing a function that takes
>>> individual columns and reassembles them into a data frame that the
>>> actual functions I need to use require but that takes me back to a lot
>>> of clumsiness both selecting the variables to pass in the dplyr call
>>> to the function and putting the reassemble-to-data-frame bit in the
>>> function I call.  (The functions I really need are reliability
>>> explorations and can called on whole dataframes.)
>>>
>>> I know I can do this using base R split and lapply but I feel sure it
>>> must be possible to do this within dplyr/tidyverse.  I'm slowly
>>> transferring most of my code to the tidyverse and hitting frustrations
>>> but also finding that it does really help me program more sensibly,
>>> handle relational data structures more easily, and write code that I
>>> seem better at reading when I come back to it after months on other
>>> things so I am slowly trying to move all my coding to tidyverse.  If I
>>> could see how to do this, it would help.
>>>
>>> Very sorry if the answer should be blindingly obvious to me.  I'd also
>>> love to have pointers to guidance to the tidyverse written for people
>>> who aren't professional coders or statisticians and that go a bit
>>> beyond the obvious basics of tidyverse into issues like this.
>>>
>>> TIA,
>>>
>>> Chris
>>>
>>
>
> --
> Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
> https://www.avast.com/antivirus

--
Small contribution in our coronavirus rigours:
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/

Chris Evans <[hidden email]> Visiting Professor, University of Sheffield <[hidden email]>
I do some consultation work for the University of Roehampton <[hidden email]> and other places
but <[hidden email]> remains my main Email address.  I have a work web site at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
   https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/

If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.