combine filter() and select()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

combine filter() and select()

Ivan Calandra-5
Dear useRs,

I'm new to the tidyverse world and I need some help on basic things.

I have the following tibble:
mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

I want to subset the rows with "a" in the column "files", and keep only
that column.

So I did:
myfile <- mytbl %>%
  filter(grepl("a", files)) %>%
  select(files)

It works, but I believe there must be an easier way to combine filter()
and select(), right?

Thank you!
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: combine filter() and select()

Chris Evans
Inline

----- Original Message -----
> From: "Ivan Calandra" <[hidden email]>
> To: "R-help" <[hidden email]>
> Sent: Wednesday, 19 August, 2020 16:56:32
> Subject: [R] combine filter() and select()

> Dear useRs,
>
> I'm new to the tidyverse world and I need some help on basic things.
>
> I have the following tibble:
> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>
> I want to subset the rows with "a" in the column "files", and keep only
> that column.
>
> So I did:
> myfile <- mytbl %>%
>  filter(grepl("a", files)) %>%
>  select(files)
>
> It works, but I believe there must be an easier way to combine filter()
> and select(), right?

I would write

mytbl %>%
  filter(grepl("a", files)) %>%
  select(files) -> myfile

as I like to keep a sort of "top to bottom and left to right" flow when writing in the tidyverse dialect of R but that's really not important.

Apart from that I think what you've done is "proper tidyverse". To me another difference between the dialects is that classical R often seems to put value on, and make it easy, to do things with incredible few characters.  I think the people who are brilliant at that sort of coding, and there are many on this list, that sort of coding is also easy to read.  I know that Chinese is easy to read if you grew up on it but to a bear of little brain like me, the much more verbose style of tidyverse repays typing time with readability when I come back to my code and, though I have little experience of this yet, when I read other poeple's code.

What did you think wasn't "easy" about what you wrote?

Very best (all),

Chris

>
> Thank you!
> Ivan
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Small contribution in our coronavirus rigours:
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/

Chris Evans <[hidden email]> Visiting Professor, University of Sheffield <[hidden email]>
I do some consultation work for the University of Roehampton <[hidden email]> and other places
but <[hidden email]> remains my main Email address.  I have a work web site at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
   https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/

If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: combine filter() and select()

Jeff Newmiller
In reply to this post by Ivan Calandra-5
The whole point of dplyr primitives is to support data frames... that is, lists of columns. When you pare your data frame down to one column you are almost certainly using the wrong tool for the job.

So, sure, your code works... and it even does what you wanted in the dplyr style, but what a pointless exercise.

grep( "a", mytbl$file, value=TRUE )

On August 19, 2020 7:56:32 AM PDT, Ivan Calandra <[hidden email]> wrote:

>Dear useRs,
>
>I'm new to the tidyverse world and I need some help on basic things.
>
>I have the following tibble:
>mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>
>I want to subset the rows with "a" in the column "files", and keep only
>that column.
>
>So I did:
>myfile <- mytbl %>%
>  filter(grepl("a", files)) %>%
>  select(files)
>
>It works, but I believe there must be an easier way to combine filter()
>and select(), right?
>
>Thank you!
>Ivan

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: combine filter() and select()

Ivan Calandra-5
In reply to this post by Chris Evans
Dear Chris,

I didn't think about having the assignment at the end as you showed; it
indeed fits the pipe workflow better.

By "easy", I actually meant shorter. As you said, in base R, I usually
do that in 1 line, so I was hoping to do the same in tidyverse. But I'm
glad to hear that I'm using tidyverse the proper way :)

Best regards,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 19/08/2020 19:21, Chris Evans wrote:

> Inline
>
> ----- Original Message -----
>> From: "Ivan Calandra" <[hidden email]>
>> To: "R-help" <[hidden email]>
>> Sent: Wednesday, 19 August, 2020 16:56:32
>> Subject: [R] combine filter() and select()
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>>
>> I want to subset the rows with "a" in the column "files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>>   filter(grepl("a", files)) %>%
>>   select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
> I would write
>
> mytbl %>%
>   filter(grepl("a", files)) %>%
>   select(files) -> myfile
>
> as I like to keep a sort of "top to bottom and left to right" flow when writing in the tidyverse dialect of R but that's really not important.
>
> Apart from that I think what you've done is "proper tidyverse". To me another difference between the dialects is that classical R often seems to put value on, and make it easy, to do things with incredible few characters.  I think the people who are brilliant at that sort of coding, and there are many on this list, that sort of coding is also easy to read.  I know that Chinese is easy to read if you grew up on it but to a bear of little brain like me, the much more verbose style of tidyverse repays typing time with readability when I come back to my code and, though I have little experience of this yet, when I read other poeple's code.
>
> What did you think wasn't "easy" about what you wrote?
>
> Very best (all),
>
> Chris
>
>> Thank you!
>> Ivan
>>
>> --
>> Dr. Ivan Calandra
>> TraCEr, laboratory for Traceology and Controlled Experiments
>> MONREPOS Archaeological Research Centre and
>> Museum for Human Behavioural Evolution
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: combine filter() and select()

Ivan Calandra-5
In reply to this post by Jeff Newmiller
Hi Jeff,

The code you show is exactly what I usually do, in base R; but I wanted
to play with tidyverse to learn it (and also understand when it makes
sense and when it doesn't).

And yes, of course, in the example I gave, I end up with a 1-cell
tibble, which could be better extracted as a length-1 vector. But my
real goal is not to end up with a single value or even a single column.
I just thought that simplifying my example was the best approach to ask
for advice.

But thank you for letting me know that what I'm doing is pointless!

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 19/08/2020 19:27, Jeff Newmiller wrote:

> The whole point of dplyr primitives is to support data frames... that is, lists of columns. When you pare your data frame down to one column you are almost certainly using the wrong tool for the job.
>
> So, sure, your code works... and it even does what you wanted in the dplyr style, but what a pointless exercise.
>
> grep( "a", mytbl$file, value=TRUE )
>
> On August 19, 2020 7:56:32 AM PDT, Ivan Calandra <[hidden email]> wrote:
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>>
>> I want to subset the rows with "a" in the column "files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>>   filter(grepl("a", files)) %>%
>>   select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
>>
>> Thank you!
>> Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: combine filter() and select()

Martin Morgan-4
A kind of hybrid answer is to use base::subset(), which supports non-standard evaluation (it searches for unquoted symbols like 'files' in the code line below in the object that is its first argument; %>% puts 'mytbl' in that first position) and row (filter) and column (select) subsets

> mytbl %>% subset(files %in% "a", files)
# A tibble: 1 x 1
  files
  <chr>
1 a

Or subset(grepl("a", files), files) if that was what you meant.

One important idea that the tidyverse implements is, in my opinion, 'endomorphism' -- you get back the same type of object as you put in -- so I wouldn't use a base R idiom that returned a vector unless that were somehow essential for the next step in the analysis.

There is value in having separate functions for filter() and select(), and probably there are edge cases where filter(), select(), and subset() behave differently, but for what it's worth subset() can be used to perform these operations individually

> mytbl %>% subset(, files)
# A tibble: 6 x 1
  files
  <chr>
1 a
2 b
3 c
4 d
5 e
6 f
> mytbl %>% subset(grepl("a", files), )
# A tibble: 1 x 2
  files  prop
  <chr> <int>
1 a         1

Martin Morgan

On 8/20/20, 2:48 AM, "R-help on behalf of Ivan Calandra" <[hidden email] on behalf of [hidden email]> wrote:

    Hi Jeff,

    The code you show is exactly what I usually do, in base R; but I wanted
    to play with tidyverse to learn it (and also understand when it makes
    sense and when it doesn't).

    And yes, of course, in the example I gave, I end up with a 1-cell
    tibble, which could be better extracted as a length-1 vector. But my
    real goal is not to end up with a single value or even a single column.
    I just thought that simplifying my example was the best approach to ask
    for advice.

    But thank you for letting me know that what I'm doing is pointless!

    Ivan

    --
    Dr. Ivan Calandra
    TraCEr, laboratory for Traceology and Controlled Experiments
    MONREPOS Archaeological Research Centre and
    Museum for Human Behavioural Evolution
    Schloss Monrepos
    56567 Neuwied, Germany
    +49 (0) 2631 9772-243
    https://www.researchgate.net/profile/Ivan_Calandra

    On 19/08/2020 19:27, Jeff Newmiller wrote:
    > The whole point of dplyr primitives is to support data frames... that is, lists of columns. When you pare your data frame down to one column you are almost certainly using the wrong tool for the job.
    >
    > So, sure, your code works... and it even does what you wanted in the dplyr style, but what a pointless exercise.
    >
    > grep( "a", mytbl$file, value=TRUE )
    >
    > On August 19, 2020 7:56:32 AM PDT, Ivan Calandra <[hidden email]> wrote:
    >> Dear useRs,
    >>
    >> I'm new to the tidyverse world and I need some help on basic things.
    >>
    >> I have the following tibble:
    >> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
    >> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
    >>
    >> I want to subset the rows with "a" in the column "files", and keep only
    >> that column.
    >>
    >> So I did:
    >> myfile <- mytbl %>%
    >>   filter(grepl("a", files)) %>%
    >>   select(files)
    >>
    >> It works, but I believe there must be an easier way to combine filter()
    >> and select(), right?
    >>
    >> Thank you!
    >> Ivan

    ______________________________________________
    [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: combine filter() and select()

hadley wickham
In reply to this post by Ivan Calandra-5
On Wed, Aug 19, 2020 at 10:03 AM Ivan Calandra <[hidden email]> wrote:

>
> Dear useRs,
>
> I'm new to the tidyverse world and I need some help on basic things.
>
> I have the following tibble:
> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>
> I want to subset the rows with "a" in the column "files", and keep only
> that column.
>
> So I did:
> myfile <- mytbl %>%
>   filter(grepl("a", files)) %>%
>   select(files)
>
> It works, but I believe there must be an easier way to combine filter()
> and select(), right?

Not in the tidyverse. As others have mentioned, both [ and subset() in
base R allow you to simultaneously subset rows and columns, but
there's no single verb in the tidyverse that does both. This is
somewhat informed by the observation that in data frames, unlike
matrices, rows and columns are not exchangeable, and you typically
want to express subsetting in rather different ways.

Hadley

--
http://hadley.nz

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.