filter() question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

filter() question

Dr Eberhard Lisse
Hi,

I have a small test sample with lab reports (PAP smears) from a number
of different providers.  These have Collection Dates and the relevant
columns glimpse() something like this:

$ Provider       <chr> "Dr C", "Dr D", "Dr C", "Dr D"
$ CollectionDate <chr> "2016-11-03", "2016-11-02", "2016-11-03", "2016-11-03"


I am looking to find (filter) the reports which were collected in the
time period common to all providers?

Something like

         the largest First Common CollectionDate
and
         the smallest Last Common CollectionDate

How would I do that?

I can of course do this "manually", ie collect all Providers and their
first and last Collection dates and then find the Common First and Last
one, but wonder if there is an elegant way of doing this :-)-O



greetings, el

--
If you want to email me, replace nospam with el

Dr. Eberhard W. Lisse   \         /       Obstetrician & Gynaecologist
[hidden email]             / *      |  Telephone: +264 81 124 6733 (cell)
PO Box 8421 Bachbrecht  \      /  If this email is signed with GPG/PGP
10007, Namibia           ;____/ Sect 20 of Act No. 4 of 2019 may apply

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: filter() question

Rasmus Liland-3
On 2020-08-21 09:03 +0200, Dr Eberhard Lisse wrote:

> Hi,
>
> I have a small test sample with lab
> reports (PAP smears) from a number of
> different providers.  These have
> Collection Dates and the relevant
> columns glimpse() something like
> this:
>
> $ Provider       <chr> "Dr C", "Dr D", "Dr C", "Dr D"
> $ CollectionDate <chr> "2016-11-03", "2016-11-02", "2016-11-03", "2016-11-03"
>
> I am looking to find (filter) the
> reports which were collected in the
> time period common to all providers?
>
> Something like
>
> the largest First Common CollectionDate
> and
> the smallest Last Common CollectionDate
>
> How would I do that?
>
> I can of course do this "manually", ie
> collect all Providers and their first
> and last Collection dates and then
> find the Common First and Last one,
> but wonder if there is an elegant way
> of doing this :-)-O
Dear Eberhard,

Is each report in a csv file with those
two columns, and you want to unify them
into a dataframe with CollectionDate
along the rows, and other details for
each provider along the columns?  This
can be done with various apply calls and
reshape.  Can you please subset some
more example data here using dput.  It
makes it so much easier.

/Rasmus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: filter() question

Eric Berger
Hi Eberhard,
Here is one possibility using dplyr.

library(dplyr)
set.seed(3)

## set up some fake data
dtV <- as.Date("2020-08-01") + 0:4
x <- sample(dtV,20,repl=TRUE)
provider <- sample(LETTERS[1:3],20,repl=TRUE)
lDf <- data.frame(Provider=provider,CollectionDate=x,stringsAsFactors=FALSE)

## get min/max date for each provider
a <- lDf %>% dplyr::group_by( Provider ) %>%
  dplyr::mutate( minDt=min(CollectionDate), maxDt=max(CollectionDate)) %>%
  dplyr::summarize( u = min(minDt), v = max(maxDt) )

## get the common interval
c(max(a$u), min(a$v))

# [1] "2020-08-02" "2020-08-04"

HTH,
Eric


On Fri, Aug 21, 2020 at 12:34 PM Rasmus Liland <[hidden email]> wrote:

> On 2020-08-21 09:03 +0200, Dr Eberhard Lisse wrote:
> > Hi,
> >
> > I have a small test sample with lab
> > reports (PAP smears) from a number of
> > different providers.  These have
> > Collection Dates and the relevant
> > columns glimpse() something like
> > this:
> >
> > $ Provider       <chr> "Dr C", "Dr D", "Dr C", "Dr D"
> > $ CollectionDate <chr> "2016-11-03", "2016-11-02", "2016-11-03",
> "2016-11-03"
> >
> > I am looking to find (filter) the
> > reports which were collected in the
> > time period common to all providers?
> >
> > Something like
> >
> >        the largest First Common CollectionDate
> > and
> >        the smallest Last Common CollectionDate
> >
> > How would I do that?
> >
> > I can of course do this "manually", ie
> > collect all Providers and their first
> > and last Collection dates and then
> > find the Common First and Last one,
> > but wonder if there is an elegant way
> > of doing this :-)-O
>
> Dear Eberhard,
>
> Is each report in a csv file with those
> two columns, and you want to unify them
> into a dataframe with CollectionDate
> along the rows, and other details for
> each provider along the columns?  This
> can be done with various apply calls and
> reshape.  Can you please subset some
> more example data here using dput.  It
> makes it so much easier.
>
> /Rasmus
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: filter() question

Dr Eberhard Lisse
Eric, Rasmus,

thank you very much,

         ALLPAP  %>%
                 group_by(Provider) %>%
                 mutate( minDt=min(CollectionDate),
                         maxDt=max(CollectionDate)) %>%
                 summarize( minDt = min(minDt),
                         maxDt = max(maxDt), .groups="keep" ) %>%
                 ungroup() %>%
                 mutate(MAX_MIN_DATE = max(minDt),
                         MIN_MAX_DATE = min(maxDt)) %>%
                 distinct(MAX_MIN_DATE, MIN_MAX_DATE)

gives me

         # A tibble: 1 x 2
                MAX_MIN_DATE MIN_MAX_DATE
                <chr>        <chr>      
         1 2010-02-05   2019-08-30  

which is correct, and what I wanted.

This is so cool :-)-O

el

On 21/08/2020 11:41, Eric Berger wrote:
> Hi Eberhard,
> Here is one possibility using dplyr.
[...]
>
> HTH,
> Eric
>
>
> On Fri, Aug 21, 2020 at 12:34 PM Rasmus Liland <[hidden email]> wrote:
>> On 2020-08-21 09:03 +0200, Dr Eberhard Lisse wrote:
[...]

>>
>> Dear Eberhard,
>>
>> Is each report in a csv file with those two columns, and you want to
>> unify them into a dataframe with CollectionDate along the rows, and
>> other details for each provider along the columns?  This can be done
>> with various apply calls and reshape.  Can you please subset some
>> more example data here using dput.  It makes it so much easier.
>>
>> /Rasmus
[...]
--
Dr. Eberhard W. Lisse   \         /       Obstetrician & Gynaecologist
[hidden email]             / *      |  Telephone: +264 81 124 6733 (cell)
PO Box 8421 Bachbrecht  \      /  If this email is signed with GPG/PGP
10007, Namibia           ;____/ Sect 20 of Act No. 4 of 2019 may apply

--
If you want to email me, replace nospam with el

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: filter() question

Jeff Newmiller
In reply to this post by Eric Berger
Using mutate followed by summarise in this case is completely unnecessary.

a <- (   lDf
     %>% dplyr::group_by( Provider )
     %>% dplyr::summarise( u = min( CollectionDate )
                         ,, v = max( CollectionDate )
                         )
     )

On August 21, 2020 2:41:26 AM PDT, Eric Berger <[hidden email]> wrote:

>Hi Eberhard,
>Here is one possibility using dplyr.
>
>library(dplyr)
>set.seed(3)
>
>## set up some fake data
>dtV <- as.Date("2020-08-01") + 0:4
>x <- sample(dtV,20,repl=TRUE)
>provider <- sample(LETTERS[1:3],20,repl=TRUE)
>lDf <-
>data.frame(Provider=provider,CollectionDate=x,stringsAsFactors=FALSE)
>
>## get min/max date for each provider
>a <- lDf %>% dplyr::group_by( Provider ) %>%
>dplyr::mutate( minDt=min(CollectionDate), maxDt=max(CollectionDate))
>%>%
>  dplyr::summarize( u = min(minDt), v = max(maxDt) )
>
>## get the common interval
>c(max(a$u), min(a$v))
>
># [1] "2020-08-02" "2020-08-04"
>
>HTH,
>Eric
>
>
>On Fri, Aug 21, 2020 at 12:34 PM Rasmus Liland <[hidden email]> wrote:
>
>> On 2020-08-21 09:03 +0200, Dr Eberhard Lisse wrote:
>> > Hi,
>> >
>> > I have a small test sample with lab
>> > reports (PAP smears) from a number of
>> > different providers.  These have
>> > Collection Dates and the relevant
>> > columns glimpse() something like
>> > this:
>> >
>> > $ Provider       <chr> "Dr C", "Dr D", "Dr C", "Dr D"
>> > $ CollectionDate <chr> "2016-11-03", "2016-11-02", "2016-11-03",
>> "2016-11-03"
>> >
>> > I am looking to find (filter) the
>> > reports which were collected in the
>> > time period common to all providers?
>> >
>> > Something like
>> >
>> >        the largest First Common CollectionDate
>> > and
>> >        the smallest Last Common CollectionDate
>> >
>> > How would I do that?
>> >
>> > I can of course do this "manually", ie
>> > collect all Providers and their first
>> > and last Collection dates and then
>> > find the Common First and Last one,
>> > but wonder if there is an elegant way
>> > of doing this :-)-O
>>
>> Dear Eberhard,
>>
>> Is each report in a csv file with those
>> two columns, and you want to unify them
>> into a dataframe with CollectionDate
>> along the rows, and other details for
>> each provider along the columns?  This
>> can be done with various apply calls and
>> reshape.  Can you please subset some
>> more example data here using dput.  It
>> makes it so much easier.
>>
>> /Rasmus
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: filter() question

Rasmus Liland-3
In reply to this post by Dr Eberhard Lisse
On 2020-08-21 13:45 +0200, Dr Eberhard Lisse wrote:
|
| Eric, Rasmus,
|
| thank you very much,
|
| ALLPAP  %>%
| group_by(Provider) %>%
| mutate( minDt=min(CollectionDate),
| maxDt=max(CollectionDate)) %>%
| summarize( minDt = min(minDt),
| maxDt = max(maxDt), .groups="keep" ) %>%
| ungroup() %>%
| mutate(MAX_MIN_DATE = max(minDt),
| MIN_MAX_DATE = min(maxDt)) %>%
| distinct(MAX_MIN_DATE, MIN_MAX_DATE)
|
| gives me
|
| # A tibble: 1 x 2
| MAX_MIN_DATE MIN_MAX_DATE
| <chr>        <chr>      
| 1 2010-02-05   2019-08-30  
|
| which is correct, and what I wanted.
|
| This is so cool :-)-O

Dear Eberhard,

handling Dates is a bit tricky in normal
R, but as long as they are characters,
like in your example there, everything
is fine.  So I made this example based
on Eric's example:

        set.seed(3)
        size <- 20
        x <- as.Date("2016-11-03") +
          sample(
            0:30,
            size,
            repl=TRUE)
        provider <- paste("Dr",
          sample(
            LETTERS[1:3],
            size,
            repl=TRUE))
        lDf <- data.frame(
          Provider=provider,
          CollectionDate=x,
          stringsAsFactors=FALSE)
       
        Provider <- sort(unique(lDf$Provider))
        a <- t(sapply(Provider, function(provider, lDf) {
            cd <- lDf[
              lDf$Provider==provider,
              "CollectionDate"]
            c("Provider"=provider,
              as.character(c(
                "u"=min(cd),
                "v"=max(cd))))
          }, lDf=lDf))
        a

which yields

             Provider u            v
        Dr A "Dr A"   "2016-11-06" "2016-12-01"
        Dr B "Dr B"   "2016-11-07" "2016-12-03"
        Dr C "Dr C"   "2016-11-04" "2016-11-12"

Before I did that, I thought about doing
something with reshape2, but I could not
come up with something good.

If you want to work with tibbles in that
tidyverse thing, which probably can more
easily work with Dates, rbinding tibbles
together apparently works:

        a <- lapply(Provider, function(provider, lDf) {
            cd <- lDf[
              lDf$Provider==provider,
              "CollectionDate"]
            dplyr::tibble(
              "Provider"=provider,
              "u"=min(cd),
              "v"=max(cd))
          }, lDf=lDf)
        a <- do.call(rbind, a)
        a

which yields

        # A tibble: 3 x 3
          Provider u          v
          <chr>    <date>     <date>
        1 Dr A     2016-11-06 2016-12-01
        2 Dr B     2016-11-07 2016-12-03
        3 Dr C     2016-11-04 2016-11-12

Best,
Rasmus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: filter() question

Dr Eberhard Lisse
Rasmus,

thank you,

I am an elderly Gynecologist, dabbling a little, ie exactly the
clientele for which the tidyverse "thingy" was developed :-)-O.

In addition I like readable code so I later understand what I was trying
to do :-)-O

el


On 2020-08-21 16:15 , Rasmus Liland wrote:
> On 2020-08-21 13:45 +0200, Dr Eberhard Lisse wrote:
[...]
> If you want to work with tibbles in that tidyverse thing, which
> probably can more easily work with Dates, rbinding tibbles together
> apparently works:
[...]

--
If you want to email me, replace nospam with el



______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment