challenging data merging/joining problem

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

challenging data merging/joining problem

Christopher W. Ryan
I've been conducting relatively simple COVID-19 surveillance for our
jurisdiction. We get data on lab test results automatically, and then
interview patients to obtain other information, like clinical details.
We had been recording all data in our long-time data system (call it
dataSystemA). But as of a particular date, there was a major change in
the data system we were compelled to use. Call the new one dataSystemB.
dataSystemA and dataSystemB contain very similar information,
conceptually, but the variable names are all different, and there are
some variables in one that do not appear in the other. Total number of
variables in each is about 50-70.

Furthermore, for about 2 weeks prior to the transition, lab test results
started being deposited into dataSystemB while dataSystemA was still
being used to record the full information from the interviews.
Subsequent to the transition, lab test results and interview information
are being recorded in dataSystemB, while the lab test results alone are
still being automatically deposited into dataSystemA.

Diagrammatically:

dataSystemA usage: ____________________ ............>>

dataSystemB usage:               ......._____________>>

where ________ represents full data and ..... represents partial data,
and >> represents the progress of time.


The following will create MWE of the data wrangling problem, with the
change in data systems made to occur overnight on 2020-07-07:

library(dplyr)
dataSystemA <- tibble(lastName = c("POTTER", "WEASLEY", "GRAINGER",
"LONGBOTTOM"),
                      firstName = c("harry", "ron", "hermione", "neville"),
                      dob = as.Date(Sys.Date() + c(sample(-3650:-3000,
size = 2), -3500, -3450)),
                      onsetDate = as.Date(Sys.Date() + 1:4),
                      symptomatic = c(TRUE, FALSE, NA, NA) )
dataSystemB <- tibble(last_name = c("GRAINGER", "LONGBOTTOM", "MALFOY",
"LOVEGOOD", "DIGGORY"),
                      first_name = c("hermione", "neville", "draco",
"luna", "cedric"),
                      birthdate = as.Date(Sys.Date() + c(-3500, -3450,
sample(-3650:-3000, size = 3))),
                      date_of_onset = as.Date(Sys.Date() + 3:7),
                      symptoms_present = c(TRUE, TRUE, FALSE, FALSE, TRUE))



Obviously, this is all the same public health problem, so I don't want a
big uninterpretable gap in my reports. I am looking for advice on the
best strategy for combining two different tibbles with some overlap in
observations (some patients appear in both data systems, with varying
degrees of completeness of data) and with some of the same things being
mesaured and recorded in the two data systems, but with different
variable names.

I've thought of two different strategies, neither of which seems ideal
but either of which might work:

1. change the variable names in dataSystemB to match their
conceptually-identical variables in dataSystemA, and then use some
version of bind_rows()

2. Create a unique identifier from last names, first names, and dates of
birth, use some type of full_join(), matching on that identifier,
obtaining all columns from both tibbles, and then "collapse"
conceptually-identical variables like onsetDate and date_of_onset using
coalesce()

Sorry for my long-windedness. Grateful for any advice.

--Chris Ryan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: challenging data merging/joining problem

Bert Gunter-2
*Just my opinion* : --> feel free to disregard

I would suggest that you stop thinking in terms of tidyverse functionality
and instead think of what kind of data structure you need for your ongoing
work and where you will source data to populate that structure both now --
including legacy data -- and in future. *Then* you can decide what
functionality you need and whether/how tidyverse functionality meets those
needs. It sounds like you are tying yourself in knots by restricting
yourself to what you know of one limited paradigm. R has the richness and
flexibility to create general purpose data structures (e.g. via lists) --
tidyverse functionality may or may not be sufficient or convenient for your
needs **once you have fully defined them (which only you can do).**


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Jul 5, 2020 at 11:51 AM Christopher W. Ryan <[hidden email]>
wrote:

> I've been conducting relatively simple COVID-19 surveillance for our
> jurisdiction. We get data on lab test results automatically, and then
> interview patients to obtain other information, like clinical details.
> We had been recording all data in our long-time data system (call it
> dataSystemA). But as of a particular date, there was a major change in
> the data system we were compelled to use. Call the new one dataSystemB.
> dataSystemA and dataSystemB contain very similar information,
> conceptually, but the variable names are all different, and there are
> some variables in one that do not appear in the other. Total number of
> variables in each is about 50-70.
>
> Furthermore, for about 2 weeks prior to the transition, lab test results
> started being deposited into dataSystemB while dataSystemA was still
> being used to record the full information from the interviews.
> Subsequent to the transition, lab test results and interview information
> are being recorded in dataSystemB, while the lab test results alone are
> still being automatically deposited into dataSystemA.
>
> Diagrammatically:
>
> dataSystemA usage: ____________________ ............>>
>
> dataSystemB usage:               ......._____________>>
>
> where ________ represents full data and ..... represents partial data,
> and >> represents the progress of time.
>
>
> The following will create MWE of the data wrangling problem, with the
> change in data systems made to occur overnight on 2020-07-07:
>
> library(dplyr)
> dataSystemA <- tibble(lastName = c("POTTER", "WEASLEY", "GRAINGER",
> "LONGBOTTOM"),
>                       firstName = c("harry", "ron", "hermione", "neville"),
>                       dob = as.Date(Sys.Date() + c(sample(-3650:-3000,
> size = 2), -3500, -3450)),
>                       onsetDate = as.Date(Sys.Date() + 1:4),
>                       symptomatic = c(TRUE, FALSE, NA, NA) )
> dataSystemB <- tibble(last_name = c("GRAINGER", "LONGBOTTOM", "MALFOY",
> "LOVEGOOD", "DIGGORY"),
>                       first_name = c("hermione", "neville", "draco",
> "luna", "cedric"),
>                       birthdate = as.Date(Sys.Date() + c(-3500, -3450,
> sample(-3650:-3000, size = 3))),
>                       date_of_onset = as.Date(Sys.Date() + 3:7),
>                       symptoms_present = c(TRUE, TRUE, FALSE, FALSE, TRUE))
>
>
>
> Obviously, this is all the same public health problem, so I don't want a
> big uninterpretable gap in my reports. I am looking for advice on the
> best strategy for combining two different tibbles with some overlap in
> observations (some patients appear in both data systems, with varying
> degrees of completeness of data) and with some of the same things being
> mesaured and recorded in the two data systems, but with different
> variable names.
>
> I've thought of two different strategies, neither of which seems ideal
> but either of which might work:
>
> 1. change the variable names in dataSystemB to match their
> conceptually-identical variables in dataSystemA, and then use some
> version of bind_rows()
>
> 2. Create a unique identifier from last names, first names, and dates of
> birth, use some type of full_join(), matching on that identifier,
> obtaining all columns from both tibbles, and then "collapse"
> conceptually-identical variables like onsetDate and date_of_onset using
> coalesce()
>
> Sorry for my long-windedness. Grateful for any advice.
>
> --Chris Ryan
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: challenging data merging/joining problem

Rasmus Liland-3
In reply to this post by Christopher W. Ryan
On 2020-07-05 14:50 -0400, Christopher W. Ryan wrote:
> I've been conducting relatively simple
> COVID-19 surveillance for our jurisdiction.

Dear Christopher,

As I am a bit unfamiliar when it comes to the
tidyverse, I wrote these lines using regular
data.frames:

        ### Convert to data.frame
        dataSystemA <- as.data.frame(dataSystemA)
        dataSystemB <- as.data.frame(dataSystemB)
       
        ### Add some unique columns to show how
        #   they are formatted later in this pipe.
        dataSystemA$someIncompleteInfo <- 1:4
        dataSystemB$other_incomplete_info <-
          c("Yes", "No", "Perhaps", "Sometimes", "Yes")
       
        ### Add the dfs to a list, as perhaps the
        #   data kan be read somehow using
        #   something like
        #   sapply(c("A", "B"), read.from.somewhere)
        dat <- list("A"=dataSystemA,
                    "B"=dataSystemB)
       
        ### Define a new dataSystem column in boths dfs
        dat <- sapply(names(dat), function(n, dat) {
          dat[[n]]$dataSystem <- n
          return(list(dat[[n]]))
        }, dat=dat)
       
        ### Read from a csv file column names
        #   where you have defined which ones
        #   are conceptually identical.
        text <- "A,B
        lastName,last_name
        firstName,first_name
        dob,birthdate
        onsetDate,date_of_onset
        symptomatic,symptoms_present"
        conceptually.identical <- read.csv(text=text)
       
        ### Rename dataSystemA columns to the
        #   dataSystemB naming convention.
        idx <- match(x=conceptually.identical$A,
                     table=colnames(dat$A))
        colnames(dat$A)[idx] <-
          conceptually.identical[idx,"B"]
       
        ### Find all column names, and fill the
        #   ones that does not exists in each
        #   df with NA, order the dfs by this
        #   vector, then rbind the dfs.
        cn <- unique(unlist(lapply(dat, colnames)))
        dat <- sapply(dat, function(x, cn) {
          x[,cn[!(cn %in% colnames(x))]] <- NA
          list(x[,cn])
        }, cn=cn)
        dat <- do.call(rbind, dat)
       
        ### Order unified df decreasingly by
        #   last_name and birthdate
        dat <- dat[order(dat$last_name,
          dat$birthdate, decreasing=FALSE),]
        rownames(dat) <- NULL
       
        dat

which yields

           last_name first_name  birthdate date_of_onset symptoms_present someIncompleteInfo dataSystem other_incomplete_info
        1    DIGGORY     cedric 2011-12-16    2020-07-12             TRUE                 NA          B                   Yes
        2   GRAINGER   hermione 2010-12-05    2020-07-08               NA                  3          A                  <NA>
        3   GRAINGER   hermione 2010-12-05    2020-07-08             TRUE                 NA          B                   Yes
        4 LONGBOTTOM    neville 2011-01-24    2020-07-09               NA                  4          A                  <NA>
        5 LONGBOTTOM    neville 2011-01-24    2020-07-09             TRUE                 NA          B                    No
        6   LOVEGOOD       luna 2011-03-15    2020-07-11            FALSE                 NA          B             Sometimes
        7     MALFOY      draco 2011-07-04    2020-07-10            FALSE                 NA          B               Perhaps
        8     POTTER      harry 2010-12-16    2020-07-06             TRUE                  1          A                  <NA>
        9    WEASLEY        ron 2010-12-30    2020-07-07            FALSE                  2          A                  <NA>

When comparing the incomplete columns in each
data system, it might be useful to do some
reshaping like this:

        cols <- c("last_name", "birthdate", "dataSystem", "date_of_onset")
        reshape(dat[,cols],
                idvar=c("last_name", "birthdate"),
                timevar="dataSystem",
                direction="wide")

which yields

           last_name  birthdate date_of_onset.B date_of_onset.A
        1    DIGGORY 2011-03-17      2020-07-13            <NA>
        2   GRAINGER 2010-12-06      2020-07-09      2020-07-09
        4 LONGBOTTOM 2011-01-25      2020-07-10      2020-07-10
        6   LOVEGOOD 2010-10-15      2020-07-12            <NA>
        7     MALFOY 2010-12-25      2020-07-11            <NA>
        8     POTTER 2011-05-09            <NA>      2020-07-07
        9    WEASLEY 2012-04-05            <NA>      2020-07-08

Best,
Rasmus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [External] challenging data merging/joining problem

Richard M. Heiberger
In reply to this post by Christopher W. Ryan
Have you talked directly to the designers of the new database?
One would hope that they had a clear migration path in mind.
Perhaps they just didn't document it to your satisfaction.

Rich

On Sun, Jul 5, 2020 at 2:51 PM Christopher W. Ryan <[hidden email]> wrote:

>
> I've been conducting relatively simple COVID-19 surveillance for our
> jurisdiction. We get data on lab test results automatically, and then
> interview patients to obtain other information, like clinical details.
> We had been recording all data in our long-time data system (call it
> dataSystemA). But as of a particular date, there was a major change in
> the data system we were compelled to use. Call the new one dataSystemB.
> dataSystemA and dataSystemB contain very similar information,
> conceptually, but the variable names are all different, and there are
> some variables in one that do not appear in the other. Total number of
> variables in each is about 50-70.
>
> Furthermore, for about 2 weeks prior to the transition, lab test results
> started being deposited into dataSystemB while dataSystemA was still
> being used to record the full information from the interviews.
> Subsequent to the transition, lab test results and interview information
> are being recorded in dataSystemB, while the lab test results alone are
> still being automatically deposited into dataSystemA.
>
> Diagrammatically:
>
> dataSystemA usage: ____________________ ............>>
>
> dataSystemB usage:               ......._____________>>
>
> where ________ represents full data and ..... represents partial data,
> and >> represents the progress of time.
>
>
> The following will create MWE of the data wrangling problem, with the
> change in data systems made to occur overnight on 2020-07-07:
>
> library(dplyr)
> dataSystemA <- tibble(lastName = c("POTTER", "WEASLEY", "GRAINGER",
> "LONGBOTTOM"),
>                       firstName = c("harry", "ron", "hermione", "neville"),
>                       dob = as.Date(Sys.Date() + c(sample(-3650:-3000,
> size = 2), -3500, -3450)),
>                       onsetDate = as.Date(Sys.Date() + 1:4),
>                       symptomatic = c(TRUE, FALSE, NA, NA) )
> dataSystemB <- tibble(last_name = c("GRAINGER", "LONGBOTTOM", "MALFOY",
> "LOVEGOOD", "DIGGORY"),
>                       first_name = c("hermione", "neville", "draco",
> "luna", "cedric"),
>                       birthdate = as.Date(Sys.Date() + c(-3500, -3450,
> sample(-3650:-3000, size = 3))),
>                       date_of_onset = as.Date(Sys.Date() + 3:7),
>                       symptoms_present = c(TRUE, TRUE, FALSE, FALSE, TRUE))
>
>
>
> Obviously, this is all the same public health problem, so I don't want a
> big uninterpretable gap in my reports. I am looking for advice on the
> best strategy for combining two different tibbles with some overlap in
> observations (some patients appear in both data systems, with varying
> degrees of completeness of data) and with some of the same things being
> mesaured and recorded in the two data systems, but with different
> variable names.
>
> I've thought of two different strategies, neither of which seems ideal
> but either of which might work:
>
> 1. change the variable names in dataSystemB to match their
> conceptually-identical variables in dataSystemA, and then use some
> version of bind_rows()
>
> 2. Create a unique identifier from last names, first names, and dates of
> birth, use some type of full_join(), matching on that identifier,
> obtaining all columns from both tibbles, and then "collapse"
> conceptually-identical variables like onsetDate and date_of_onset using
> coalesce()
>
> Sorry for my long-windedness. Grateful for any advice.
>
> --Chris Ryan
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] challenging data merging/joining problem

Eric Berger
Hi Christopher,
This seems pretty standard and straightforward, unless I am missing
something. You can do the "full join" without changing variable names.
Here's a small code example with two tibbles, a and b, where the
column 'x' in a corresponds to the column 'u' in b.

a <- tibble(x=1:15,y=21:35)
b <- tibble(u=c(1:10,51:55),z=31:45)
foo <- merge(a,b,by.x="x",by.y="u",all.x=TRUE,all.y=TRUE)
foo

#     x  y  z
# 1   1 21 31
# 2   2 22 32
# 3   3 23 33
# 4   4 24 34
# 5   5 25 35
# 6   6 26 36
# 7   7 27 37
# 8   8 28 38
# 9   9 29 39
# 10 10 30 40
# 11 11 31 NA
# 12 12 32 NA
# 13 13 33 NA
# 14 14 34 NA
# 15 15 35 NA
# 16 51 NA 41
# 17 52 NA 42
# 18 53 NA 43
# 19 54 NA 44
# 20 55 NA 45

HTH,
Eric

On Mon, Jul 6, 2020 at 2:07 AM Richard M. Heiberger <[hidden email]> wrote:

>
> Have you talked directly to the designers of the new database?
> One would hope that they had a clear migration path in mind.
> Perhaps they just didn't document it to your satisfaction.
>
> Rich
>
> On Sun, Jul 5, 2020 at 2:51 PM Christopher W. Ryan <[hidden email]> wrote:
> >
> > I've been conducting relatively simple COVID-19 surveillance for our
> > jurisdiction. We get data on lab test results automatically, and then
> > interview patients to obtain other information, like clinical details.
> > We had been recording all data in our long-time data system (call it
> > dataSystemA). But as of a particular date, there was a major change in
> > the data system we were compelled to use. Call the new one dataSystemB.
> > dataSystemA and dataSystemB contain very similar information,
> > conceptually, but the variable names are all different, and there are
> > some variables in one that do not appear in the other. Total number of
> > variables in each is about 50-70.
> >
> > Furthermore, for about 2 weeks prior to the transition, lab test results
> > started being deposited into dataSystemB while dataSystemA was still
> > being used to record the full information from the interviews.
> > Subsequent to the transition, lab test results and interview information
> > are being recorded in dataSystemB, while the lab test results alone are
> > still being automatically deposited into dataSystemA.
> >
> > Diagrammatically:
> >
> > dataSystemA usage: ____________________ ............>>
> >
> > dataSystemB usage:               ......._____________>>
> >
> > where ________ represents full data and ..... represents partial data,
> > and >> represents the progress of time.
> >
> >
> > The following will create MWE of the data wrangling problem, with the
> > change in data systems made to occur overnight on 2020-07-07:
> >
> > library(dplyr)
> > dataSystemA <- tibble(lastName = c("POTTER", "WEASLEY", "GRAINGER",
> > "LONGBOTTOM"),
> >                       firstName = c("harry", "ron", "hermione", "neville"),
> >                       dob = as.Date(Sys.Date() + c(sample(-3650:-3000,
> > size = 2), -3500, -3450)),
> >                       onsetDate = as.Date(Sys.Date() + 1:4),
> >                       symptomatic = c(TRUE, FALSE, NA, NA) )
> > dataSystemB <- tibble(last_name = c("GRAINGER", "LONGBOTTOM", "MALFOY",
> > "LOVEGOOD", "DIGGORY"),
> >                       first_name = c("hermione", "neville", "draco",
> > "luna", "cedric"),
> >                       birthdate = as.Date(Sys.Date() + c(-3500, -3450,
> > sample(-3650:-3000, size = 3))),
> >                       date_of_onset = as.Date(Sys.Date() + 3:7),
> >                       symptoms_present = c(TRUE, TRUE, FALSE, FALSE, TRUE))
> >
> >
> >
> > Obviously, this is all the same public health problem, so I don't want a
> > big uninterpretable gap in my reports. I am looking for advice on the
> > best strategy for combining two different tibbles with some overlap in
> > observations (some patients appear in both data systems, with varying
> > degrees of completeness of data) and with some of the same things being
> > mesaured and recorded in the two data systems, but with different
> > variable names.
> >
> > I've thought of two different strategies, neither of which seems ideal
> > but either of which might work:
> >
> > 1. change the variable names in dataSystemB to match their
> > conceptually-identical variables in dataSystemA, and then use some
> > version of bind_rows()
> >
> > 2. Create a unique identifier from last names, first names, and dates of
> > birth, use some type of full_join(), matching on that identifier,
> > obtaining all columns from both tibbles, and then "collapse"
> > conceptually-identical variables like onsetDate and date_of_onset using
> > coalesce()
> >
> > Sorry for my long-windedness. Grateful for any advice.
> >
> > --Chris Ryan
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [External] challenging data merging/joining problem

Rasmus Liland-3
On 2020-07-06 12:03 +0300, Eric Berger wrote:

> On Mon, Jul 6, 2020 at 2:07 AM Richard M. Heiberger <[hidden email]> wrote:
> > On Sun, Jul 5, 2020 at 2:51 PM Christopher W. Ryan <[hidden email]> wrote:
> > >
> > > I've been conducting relatively simple
> > > COVID-19 surveillance for our
> > > jurisdiction.
> >
> > Have you talked directly to the designers
> > of the new database?
>
> Hi Christopher,
> This seems pretty standard and
> straightforward, unless I am missing
> something. You can do the "full join"
> without changing variable names.  Here's a
> small code example with two tibbles, a and
> b, where the column 'x' in a corresponds to
> the column 'u' in b.
>
> a <- tibble(x=1:15,y=21:35)
> b <- tibble(u=c(1:10,51:55),z=31:45)
> foo <- merge(a,b,by.x="x",by.y="u",all.x=TRUE,all.y=TRUE)
Perhaps something like

        new_names <-
          c("dob"="birthdate",
            "lastName"="last_name",
            "firstName"="first_name")
        idx <- match(x=names(new_names),
          table=colnames(dataSystemA))
        colnames(dataSystemA)[idx] <- new_names
        merge(
          x=dataSystemA,
          y=dataSystemB,
          by=new_names,
          all=TRUE)

which yields

           birthdate  last_name first_name  onsetDate
        1 2010-10-11   LOVEGOOD       luna       <NA>
        2 2010-12-06   GRAINGER   hermione 2020-07-09
        3 2011-01-25 LONGBOTTOM    neville 2020-07-10
        4 2011-07-03     MALFOY      draco       <NA>
        5 2011-07-14    WEASLEY        ron 2020-07-08
        6 2011-10-04     POTTER      harry 2020-07-07
        7 2012-02-13    DIGGORY     cedric       <NA>
          symptomatic date_of_onset symptoms_present
        1          NA    2020-07-12            FALSE
        2          NA    2020-07-09             TRUE
        3          NA    2020-07-10             TRUE
        4          NA    2020-07-11            FALSE
        5       FALSE          <NA>               NA
        6        TRUE          <NA>               NA
        7          NA    2020-07-13             TRUE

?

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment