Is there a simple way to analyse all the data using dplyr?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there a simple way to analyse all the data using dplyr?

Chris Evans
I am sure the answer is "yes" and I'm also sure the question may sound mad. Here's a reprex that I think captures what I'm doing

n <- 500
gender <- sample(c("Man","Woman","Other"), n, replace = TRUE)
GPC_score <- rnorm(n)
scaleMeasures <- runif(n)
bind_cols(gender = gender,
GPC_score = GPC_score,
scaleMeasures = scaleMeasures) -> tibUse

### let's have the correlation between the two variables broken down by gender
tibUse %>%
  filter(gender != "Other") %>%
  select(gender, GPC_score, scaleMeasures) %>%
  na.omit() %>%
  group_by(gender) %>%
  summarise(cor = cor(cur_data())[1,2]) -> tmp1

### but I'd also like the correlation for the whole dataset, not by gender
### this is a kludge to achieve that which I am using partly because I cant'
### find the equivalent of cur_data() for an ungrouped tibble/df
tibUse %>%
  mutate(gender = "All") %>% # nasty kludge to get all the data!
  select(gender, GPC_score, scaleMeasures) %>%
  na.omit() %>%
  group_by(gender) %>% # ditto!
  summarise(cor = cor(cur_data())[1,2]) -> tmp2

bind_rows(tmp1,
  tmp2)

### gets me what I want:
# A tibble: 3 x 2
gender cor
<chr> <dbl>
1 Man 0.0225
2 Woman 0.0685
3 All 0.0444

In reality I have some functions that are more complex than cor()[2,1] (sorry about that particular kludge) that digest dataframes and I'd love to have a simpler way of doing this.

So two questions:
1) I am sure there a term/function that works on an ungrouped tibble in dplyr as cur_data() does for a grouped tibble ... but I can't find it.
2) I suspect someone has automated a way to get the analysis of the complete data after the analyses of the groups within a single dplyr run ... it seems an obvious and common use case, but I can't find that either.

Sorry, I'm over 99% sure I'm being stupid and missing the obvious here ... but that's the recurrent problem I have with my wetware and searchware doesn't seem to being fixing this!

TIA,

Chris

--
Small contribution in our coronavirus rigours:
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/

Chris Evans <[hidden email]> Visiting Professor, University of Sheffield <[hidden email]>
I do some consultation work for the University of Roehampton <[hidden email]> and other places
but <[hidden email]> remains my main Email address.  I have a work web site at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
   https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/

If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Is there a simple way to analyse all the data using dplyr?

Eric Berger
Hi,
I am not sure if the request is about a 'simple way' or requires
dplyr. Here's an approach without using dplyr that is just 2 lines
(not counting creating the data or outputting the result).

n <- 500
myDf <- data.frame( gender=sample(c("Man","Woman","Other"), n, replace = TRUE),
                    GPC_score=rnorm(n), scaleMeasures=runif(n))
aL   <- list(Man="Man",Woman="Woman",All=c("Man","Woman","Other"))
z    <- sapply( 1:length(aL), function(i) { x=myDf[ myDf$gender %in%
aL[[i]], ]; cor(x[,2],x[,3]) } )
names(z) <- names(aL)
z

HTH,
Eric


On Mon, Sep 21, 2020 at 3:13 PM Chris Evans <[hidden email]> wrote:

>
> I am sure the answer is "yes" and I'm also sure the question may sound mad. Here's a reprex that I think captures what I'm doing
>
> n <- 500
> gender <- sample(c("Man","Woman","Other"), n, replace = TRUE)
> GPC_score <- rnorm(n)
> scaleMeasures <- runif(n)
> bind_cols(gender = gender,
> GPC_score = GPC_score,
> scaleMeasures = scaleMeasures) -> tibUse
>
> ### let's have the correlation between the two variables broken down by gender
> tibUse %>%
>   filter(gender != "Other") %>%
>   select(gender, GPC_score, scaleMeasures) %>%
>   na.omit() %>%
>   group_by(gender) %>%
>   summarise(cor = cor(cur_data())[1,2]) -> tmp1
>
> ### but I'd also like the correlation for the whole dataset, not by gender
> ### this is a kludge to achieve that which I am using partly because I cant'
> ### find the equivalent of cur_data() for an ungrouped tibble/df
> tibUse %>%
>   mutate(gender = "All") %>% # nasty kludge to get all the data!
>   select(gender, GPC_score, scaleMeasures) %>%
>   na.omit() %>%
>   group_by(gender) %>% # ditto!
>   summarise(cor = cor(cur_data())[1,2]) -> tmp2
>
> bind_rows(tmp1,
>   tmp2)
>
> ### gets me what I want:
> # A tibble: 3 x 2
> gender cor
> <chr> <dbl>
> 1 Man 0.0225
> 2 Woman 0.0685
> 3 All 0.0444
>
> In reality I have some functions that are more complex than cor()[2,1] (sorry about that particular kludge) that digest dataframes and I'd love to have a simpler way of doing this.
>
> So two questions:
> 1) I am sure there a term/function that works on an ungrouped tibble in dplyr as cur_data() does for a grouped tibble ... but I can't find it.
> 2) I suspect someone has automated a way to get the analysis of the complete data after the analyses of the groups within a single dplyr run ... it seems an obvious and common use case, but I can't find that either.
>
> Sorry, I'm over 99% sure I'm being stupid and missing the obvious here ... but that's the recurrent problem I have with my wetware and searchware doesn't seem to being fixing this!
>
> TIA,
>
> Chris
>
> --
> Small contribution in our coronavirus rigours:
> https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/
>
> Chris Evans <[hidden email]> Visiting Professor, University of Sheffield <[hidden email]>
> I do some consultation work for the University of Roehampton <[hidden email]> and other places
> but <[hidden email]> remains my main Email address.  I have a work web site at:
>    https://www.psyctc.org/psyctc/
> and a site I manage for CORE and CORE system trust at:
>    http://www.coresystemtrust.org.uk/
> I have "semigrated" to France, see:
>    https://www.psyctc.org/pelerinage2016/semigrating-to-france/
>    https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/
>
> If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at:
>    https://www.psyctc.org/pelerinage2016/ceworkdiary/
> Beware: French time, generally an hour ahead of UK.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Is there a simple way to analyse all the data using dplyr?

Chris Evans
Thanks Eric,

That's very neat!  Sort of fits my belief about base R and telegrams (that's not knocking it, I really do respect it, my wetware is just not good at it).

For many reasons, particularly the convenience for formatting and passing on results from the real function I'm applying, I am really keen to find tidyverse/dplyr answers/options.  Any offers?!

TIA (all),

Chris

----- Original Message -----
> From: "Eric Berger" <[hidden email]>
> To: "Chris Evans" <[hidden email]>
> Cc: "r-help" <[hidden email]>
> Sent: Monday, 21 September, 2020 15:03:44
> Subject: Re: [R] Is there a simple way to analyse all the data using dplyr?

> Hi,
> I am not sure if the request is about a 'simple way' or requires
> dplyr. Here's an approach without using dplyr that is just 2 lines
> (not counting creating the data or outputting the result).
>
> n <- 500
> myDf <- data.frame( gender=sample(c("Man","Woman","Other"), n, replace = TRUE),
>                    GPC_score=rnorm(n), scaleMeasures=runif(n))
> aL   <- list(Man="Man",Woman="Woman",All=c("Man","Woman","Other"))
> z    <- sapply( 1:length(aL), function(i) { x=myDf[ myDf$gender %in%
> aL[[i]], ]; cor(x[,2],x[,3]) } )
> names(z) <- names(aL)
> z
>
> HTH,
> Eric
>
>
> On Mon, Sep 21, 2020 at 3:13 PM Chris Evans <[hidden email]> wrote:
>>
>> I am sure the answer is "yes" and I'm also sure the question may sound mad.
>> Here's a reprex that I think captures what I'm doing
>>
>> n <- 500
>> gender <- sample(c("Man","Woman","Other"), n, replace = TRUE)
>> GPC_score <- rnorm(n)
>> scaleMeasures <- runif(n)
>> bind_cols(gender = gender,
>> GPC_score = GPC_score,
>> scaleMeasures = scaleMeasures) -> tibUse
>>
>> ### let's have the correlation between the two variables broken down by gender
>> tibUse %>%
>>   filter(gender != "Other") %>%
>>   select(gender, GPC_score, scaleMeasures) %>%
>>   na.omit() %>%
>>   group_by(gender) %>%
>>   summarise(cor = cor(cur_data())[1,2]) -> tmp1
>>
>> ### but I'd also like the correlation for the whole dataset, not by gender
>> ### this is a kludge to achieve that which I am using partly because I cant'
>> ### find the equivalent of cur_data() for an ungrouped tibble/df
>> tibUse %>%
>>   mutate(gender = "All") %>% # nasty kludge to get all the data!
>>   select(gender, GPC_score, scaleMeasures) %>%
>>   na.omit() %>%
>>   group_by(gender) %>% # ditto!
>>   summarise(cor = cor(cur_data())[1,2]) -> tmp2
>>
>> bind_rows(tmp1,
>>   tmp2)
>>
>> ### gets me what I want:
>> # A tibble: 3 x 2
>> gender cor
>> <chr> <dbl>
>> 1 Man 0.0225
>> 2 Woman 0.0685
>> 3 All 0.0444
>>
>> In reality I have some functions that are more complex than cor()[2,1] (sorry
>> about that particular kludge) that digest dataframes and I'd love to have a
>> simpler way of doing this.
>>
>> So two questions:
>> 1) I am sure there a term/function that works on an ungrouped tibble in dplyr as
>> cur_data() does for a grouped tibble ... but I can't find it.
>> 2) I suspect someone has automated a way to get the analysis of the complete
>> data after the analyses of the groups within a single dplyr run ... it seems an
>> obvious and common use case, but I can't find that either.
>>
>> Sorry, I'm over 99% sure I'm being stupid and missing the obvious here ... but
>> that's the recurrent problem I have with my wetware and searchware doesn't seem
>> to being fixing this!
>>
>> TIA,
>>
>> Chris
>>
>> --
>> Small contribution in our coronavirus rigours:
>> https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/
>>
>> Chris Evans <[hidden email]> Visiting Professor, University of Sheffield
>> <[hidden email]>
>> I do some consultation work for the University of Roehampton
>> <[hidden email]> and other places
>> but <[hidden email]> remains my main Email address.  I have a work web site
>> at:
>>    https://www.psyctc.org/psyctc/
>> and a site I manage for CORE and CORE system trust at:
>>    http://www.coresystemtrust.org.uk/
>> I have "semigrated" to France, see:
>>    https://www.psyctc.org/pelerinage2016/semigrating-to-france/
>>    https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/
>>
>> If you want an Emeeting, I am trying to keep them to Thursdays and my diary is
>> at:
>>    https://www.psyctc.org/pelerinage2016/ceworkdiary/
>> Beware: French time, generally an hour ahead of UK.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

--
Small contribution in our coronavirus rigours:
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/

Chris Evans <[hidden email]> Visiting Professor, University of Sheffield <[hidden email]>
I do some consultation work for the University of Roehampton <[hidden email]> and other places
but <[hidden email]> remains my main Email address.  I have a work web site at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
   https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/

If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.