dplyr - add/expand rows

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

dplyr - add/expand rows

Hutchinson, David (EC)
I have a returned tibble of station operational record similar to the following:

> data.collection
# A tibble: 5 x 4
  STATION_NUMBER YEAR_FROM YEAR_TO RECORD
           <chr>     <int>   <int>  <chr>
1        07EA001      1960    1960    QMS
2        07EA001      1961    1970    QMC
3        07EA001      1971    1971    QMM
4        07EA001      1972    1976    QMC
5        07EA001      1977    1983    QRC

I would like to reshape this to one operational record (row) per year per station. Something like:

07EA001              1960      QMS
07EA001              1961      QMC
07EA001              1962      QMC
07EA001              1963      QMC
...
07EA001              1971      QMM

Can this be done in dplyr easily?

Thanks in advance,

David

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

R help mailing list-2
dplyr may have something for this, but in base R I think the following does
what you want.  I've shortened the name of your data set to 'd'.

i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
[hidden email]> wrote:

> I have a returned tibble of station operational record similar to the
> following:
>
> > data.collection
> # A tibble: 5 x 4
>   STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>            <chr>     <int>   <int>  <chr>
> 1        07EA001      1960    1960    QMS
> 2        07EA001      1961    1970    QMC
> 3        07EA001      1971    1971    QMM
> 4        07EA001      1972    1976    QMC
> 5        07EA001      1977    1983    QRC
>
> I would like to reshape this to one operational record (row) per year per
> station. Something like:
>
> 07EA001              1960      QMS
> 07EA001              1961      QMC
> 07EA001              1962      QMC
> 07EA001              1963      QMC
> ...
> 07EA001              1971      QMM
>
> Can this be done in dplyr easily?
>
> Thanks in advance,
>
> David
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

David Winsemius
In reply to this post by Hutchinson, David (EC)

> On Nov 25, 2017, at 11:18 AM, Hutchinson, David (EC) <[hidden email]> wrote:
>
> I have a returned tibble of station operational record similar to the following:
>
>> data.collection
> # A tibble: 5 x 4
>  STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>           <chr>     <int>   <int>  <chr>
> 1        07EA001      1960    1960    QMS
> 2        07EA001      1961    1970    QMC
> 3        07EA001      1971    1971    QMM
> 4        07EA001      1972    1976    QMC
> 5        07EA001      1977    1983    QRC
>
> I would like to reshape this to one operational record (row) per year per station. Something like:
>
> 07EA001              1960      QMS
> 07EA001              1961      QMC
> 07EA001              1962      QMC
> 07EA001              1963      QMC
> ...
> 07EA001              1971      QMM
>
> Can this be done in dplyr easily?

Probably, yes. This looks like a feasible plan might be to "fill-in" the gaps with a last observation carried forward value within categories of station number. The na.locf function in package zoo is very handy for some of these tasks. Or.... Perhaps merging this data with a skeleton data object with station numbers and a `seq`-built vectors for the range of years.  Why don't you post a data example with sufficient complexity to represent the problem? Perhaps:

 dput( head( data.collection, 20) )

It's clear that the first 5 lines are not sufficient since there's only one station. It's kind of a pain to try to construct tibble objects from their print output representations. And posting code to build examples is a specific suggestion in the Posting Guide.

--
David

>
> Thanks in advance,
>
> David
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

Bert Gunter-2
In reply to this post by R help mailing list-2
To David W.'s point about lack of a suitable reprex ("reproducible
example"), Bill's solution seems to be for only one station.

Here is a reprex and modification that I think does what was requested for
multiple stations, again using base R and data frames, not dplyr and
tibbles.

First the reprex with **two** stations:

> d <- data.frame( station = rep(c("one","two"),c(5,4)),
               from = c(60,61,71,72,76,60,65,82,83),
                to = c(60,70,71,76,83,64, 81, 82,83),
                record = c("A","B","C","B","D","B","B","D","E"))

> d
  station from to record
1     one   60 60      A
2     one   61 70      B
3     one   71 71      C
4     one   72 76      B
5     one   76 83      D
6     two   60 64      B
7     two   65 81      B
8     two   82 82      D
9     two   83 83      E

## Now the conversion code using base R, especially by():

> out <- by(d, d$station, function(x) with(x, {
+    i <- to - from +1
+    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
+ }))


> out <- data.frame(station =
rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)


> out
   station YEAR RECORD
1      one   60      A
2      one   61      B
3      one   62      B
4      one   63      B
5      one   64      B
6      one   65      B
7      one   66      B
8      one   67      B
9      one   68      B
10     one   69      B
11     one   70      B
12     one   71      C
13     one   72      B
14     one   73      B
15     one   74      B
16     one   75      B
17     one   76      B
18     one   76      D
19     one   77      D
20     one   78      D
21     one   79      D
22     one   80      D
23     one   81      D
24     one   82      D
25     one   83      D
26     two   60      B
27     two   61      B
28     two   62      B
29     two   63      B
30     two   64      B
31     two   65      B
32     two   66      B
33     two   67      B
34     two   68      B
35     two   69      B
36     two   70      B
37     two   71      B
38     two   72      B
39     two   73      B
40     two   74      B
41     two   75      B
42     two   76      B
43     two   77      B
44     two   78      B
45     two   79      B
46     two   80      B
47     two   81      B
48     two   82      D
49     two   83      E

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
[hidden email]> wrote:

> dplyr may have something for this, but in base R I think the following does
> what you want.  I've shortened the name of your data set to 'd'.
>
> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
> [hidden email]> wrote:
>
> > I have a returned tibble of station operational record similar to the
> > following:
> >
> > > data.collection
> > # A tibble: 5 x 4
> >   STATION_NUMBER YEAR_FROM YEAR_TO RECORD
> >            <chr>     <int>   <int>  <chr>
> > 1        07EA001      1960    1960    QMS
> > 2        07EA001      1961    1970    QMC
> > 3        07EA001      1971    1971    QMM
> > 4        07EA001      1972    1976    QMC
> > 5        07EA001      1977    1983    QRC
> >
> > I would like to reshape this to one operational record (row) per year per
> > station. Something like:
> >
> > 07EA001              1960      QMS
> > 07EA001              1961      QMC
> > 07EA001              1962      QMC
> > 07EA001              1963      QMC
> > ...
> > 07EA001              1971      QMM
> >
> > Can this be done in dplyr easily?
> >
> > Thanks in advance,
> >
> > David
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

jholtman
try this:

##########################################

library(dplyr)

input <- tribble(
  ~station, ~from, ~to, ~record,
 "07EA001" ,    1960  ,  1960  , "QMS",
 "07EA001"  ,   1961 ,   1970  , "QMC",
 "07EA001" ,    1971  ,  1971  , "QMM",
 "07EA001" ,    1972  ,  1976  , "QMC",
 "07EA001" ,    1977  ,  1983  , "QRC"
)

result <- input %>%
  rowwise() %>%
  do(tibble(station = .$station,
            year = seq(.$from, .$to),
            record = .$record)
  )

###########################



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <[hidden email]> wrote:

> To David W.'s point about lack of a suitable reprex ("reproducible
> example"), Bill's solution seems to be for only one station.
>
> Here is a reprex and modification that I think does what was requested for
> multiple stations, again using base R and data frames, not dplyr and
> tibbles.
>
> First the reprex with **two** stations:
>
> > d <- data.frame( station = rep(c("one","two"),c(5,4)),
>                from = c(60,61,71,72,76,60,65,82,83),
>                 to = c(60,70,71,76,83,64, 81, 82,83),
>                 record = c("A","B","C","B","D","B","B","D","E"))
>
> > d
>   station from to record
> 1     one   60 60      A
> 2     one   61 70      B
> 3     one   71 71      C
> 4     one   72 76      B
> 5     one   76 83      D
> 6     two   60 64      B
> 7     two   65 81      B
> 8     two   82 82      D
> 9     two   83 83      E
>
> ## Now the conversion code using base R, especially by():
>
> > out <- by(d, d$station, function(x) with(x, {
> +    i <- to - from +1
> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
> + }))
>
>
> > out <- data.frame(station =
> rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)
>
>
> > out
>    station YEAR RECORD
> 1      one   60      A
> 2      one   61      B
> 3      one   62      B
> 4      one   63      B
> 5      one   64      B
> 6      one   65      B
> 7      one   66      B
> 8      one   67      B
> 9      one   68      B
> 10     one   69      B
> 11     one   70      B
> 12     one   71      C
> 13     one   72      B
> 14     one   73      B
> 15     one   74      B
> 16     one   75      B
> 17     one   76      B
> 18     one   76      D
> 19     one   77      D
> 20     one   78      D
> 21     one   79      D
> 22     one   80      D
> 23     one   81      D
> 24     one   82      D
> 25     one   83      D
> 26     two   60      B
> 27     two   61      B
> 28     two   62      B
> 29     two   63      B
> 30     two   64      B
> 31     two   65      B
> 32     two   66      B
> 33     two   67      B
> 34     two   68      B
> 35     two   69      B
> 36     two   70      B
> 37     two   71      B
> 38     two   72      B
> 39     two   73      B
> 40     two   74      B
> 41     two   75      B
> 42     two   76      B
> 43     two   77      B
> 44     two   78      B
> 45     two   79      B
> 46     two   80      B
> 47     two   81      B
> 48     two   82      D
> 49     two   83      E
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
> [hidden email]> wrote:
>
> > dplyr may have something for this, but in base R I think the following
> does
> > what you want.  I've shortened the name of your data set to 'd'.
> >
> > i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
> > j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
> > transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
> > [hidden email]> wrote:
> >
> > > I have a returned tibble of station operational record similar to the
> > > following:
> > >
> > > > data.collection
> > > # A tibble: 5 x 4
> > >   STATION_NUMBER YEAR_FROM YEAR_TO RECORD
> > >            <chr>     <int>   <int>  <chr>
> > > 1        07EA001      1960    1960    QMS
> > > 2        07EA001      1961    1970    QMC
> > > 3        07EA001      1971    1971    QMM
> > > 4        07EA001      1972    1976    QMC
> > > 5        07EA001      1977    1983    QRC
> > >
> > > I would like to reshape this to one operational record (row) per year
> per
> > > station. Something like:
> > >
> > > 07EA001              1960      QMS
> > > 07EA001              1961      QMC
> > > 07EA001              1962      QMC
> > > 07EA001              1963      QMC
> > > ...
> > > 07EA001              1971      QMM
> > >
> > > Can this be done in dplyr easily?
> > >
> > > Thanks in advance,
> > >
> > > David
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> > > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

Martin Morgan-3
On 11/26/2017 08:42 PM, jim holtman wrote:

> try this:
>
> ##########################################
>
> library(dplyr)
>
> input <- tribble(
>    ~station, ~from, ~to, ~record,
>   "07EA001" ,    1960  ,  1960  , "QMS",
>   "07EA001"  ,   1961 ,   1970  , "QMC",
>   "07EA001" ,    1971  ,  1971  , "QMM",
>   "07EA001" ,    1972  ,  1976  , "QMC",
>   "07EA001" ,    1977  ,  1983  , "QRC"
> )
>
> result <- input %>%
>    rowwise() %>%
>    do(tibble(station = .$station,
>              year = seq(.$from, .$to),
>              record = .$record)
>    )
>
> ###########################

In a bit more 'base R' mode I did

   input$year <- with(input, Map(seq, from, to))
   res0 <- with(input, Map(data.frame, station=station, year=year,
       record=record))
    as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3

resulting in

 > as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
    station  year record
     <fctr> <int> <fctr>
  1 07EA001  1960    QMS
  2 07EA001  1961    QMC
  3 07EA001  1962    QMC
  4 07EA001  1963    QMC
  5 07EA001  1964    QMC
  6 07EA001  1965    QMC
  7 07EA001  1966    QMC
  8 07EA001  1967    QMC
  9 07EA001  1968    QMC
10 07EA001  1969    QMC
# ... with 14 more rows

I though I should have been able to use `tibble` in the second step, but
that leads to a (cryptic) error

 > res0 <- with(input, Map(tibble, station=station, year=year,
record=record))Error in captureDots(strict = `__quosured`) :
   the argument has already been evaluated

The 'station' and 'record' columns are factors, so different from the
original input, but this seems the appropriate data type for theses columns.

It's interesting to compare the 'specialized' knowledge needed for each
approach -- rowwise(), do(), .$ for tidyverse, with(), do.call(), maybe
rbind() and Map() for base R.

Martin

>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <[hidden email]> wrote:
>
>> To David W.'s point about lack of a suitable reprex ("reproducible
>> example"), Bill's solution seems to be for only one station.
>>
>> Here is a reprex and modification that I think does what was requested for
>> multiple stations, again using base R and data frames, not dplyr and
>> tibbles.
>>
>> First the reprex with **two** stations:
>>
>>> d <- data.frame( station = rep(c("one","two"),c(5,4)),
>>                 from = c(60,61,71,72,76,60,65,82,83),
>>                  to = c(60,70,71,76,83,64, 81, 82,83),
>>                  record = c("A","B","C","B","D","B","B","D","E"))
>>
>>> d
>>    station from to record
>> 1     one   60 60      A
>> 2     one   61 70      B
>> 3     one   71 71      C
>> 4     one   72 76      B
>> 5     one   76 83      D
>> 6     two   60 64      B
>> 7     two   65 81      B
>> 8     two   82 82      D
>> 9     two   83 83      E
>>
>> ## Now the conversion code using base R, especially by():
>>
>>> out <- by(d, d$station, function(x) with(x, {
>> +    i <- to - from +1
>> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
>> + }))
>>
>>
>>> out <- data.frame(station =
>> rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)
>>
>>
>>> out
>>     station YEAR RECORD
>> 1      one   60      A
>> 2      one   61      B
>> 3      one   62      B
>> 4      one   63      B
>> 5      one   64      B
>> 6      one   65      B
>> 7      one   66      B
>> 8      one   67      B
>> 9      one   68      B
>> 10     one   69      B
>> 11     one   70      B
>> 12     one   71      C
>> 13     one   72      B
>> 14     one   73      B
>> 15     one   74      B
>> 16     one   75      B
>> 17     one   76      B
>> 18     one   76      D
>> 19     one   77      D
>> 20     one   78      D
>> 21     one   79      D
>> 22     one   80      D
>> 23     one   81      D
>> 24     one   82      D
>> 25     one   83      D
>> 26     two   60      B
>> 27     two   61      B
>> 28     two   62      B
>> 29     two   63      B
>> 30     two   64      B
>> 31     two   65      B
>> 32     two   66      B
>> 33     two   67      B
>> 34     two   68      B
>> 35     two   69      B
>> 36     two   70      B
>> 37     two   71      B
>> 38     two   72      B
>> 39     two   73      B
>> 40     two   74      B
>> 41     two   75      B
>> 42     two   76      B
>> 43     two   77      B
>> 44     two   78      B
>> 45     two   79      B
>> 46     two   80      B
>> 47     two   81      B
>> 48     two   82      D
>> 49     two   83      E
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
>> [hidden email]> wrote:
>>
>>> dplyr may have something for this, but in base R I think the following
>> does
>>> what you want.  I've shortened the name of your data set to 'd'.
>>>
>>> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
>>> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
>>> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
>>> [hidden email]> wrote:
>>>
>>>> I have a returned tibble of station operational record similar to the
>>>> following:
>>>>
>>>>> data.collection
>>>> # A tibble: 5 x 4
>>>>    STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>>>>             <chr>     <int>   <int>  <chr>
>>>> 1        07EA001      1960    1960    QMS
>>>> 2        07EA001      1961    1970    QMC
>>>> 3        07EA001      1971    1971    QMM
>>>> 4        07EA001      1972    1976    QMC
>>>> 5        07EA001      1977    1983    QRC
>>>>
>>>> I would like to reshape this to one operational record (row) per year
>> per
>>>> station. Something like:
>>>>
>>>> 07EA001              1960      QMS
>>>> 07EA001              1961      QMC
>>>> 07EA001              1962      QMC
>>>> 07EA001              1963      QMC
>>>> ...
>>>> 07EA001              1971      QMM
>>>>
>>>> Can this be done in dplyr easily?
>>>>
>>>> Thanks in advance,
>>>>
>>>> David
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


This email message may contain legally privileged and/or...{{dropped:2}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

Michael Lawrence-3
Or with the Bioconductor IRanges package:

df <- with(input, DataFrame(station, year=IRanges(from, to), record))
expand(df, "year")

DataFrame with 24 rows and 3 columns
        station     year      record
    <character> <integer> <character>
1       07EA001      1960         QMS
2       07EA001      1961         QMC
3       07EA001      1962         QMC
4       07EA001      1963         QMC
5       07EA001      1964         QMC
...         ...       ...         ...
20      07EA001      1979         QRC
21      07EA001      1980         QRC
22      07EA001      1981         QRC
23      07EA001      1982         QRC
24      07EA001      1983         QRC

If you tell the computer more about your data, it can do more things for
you.

Michael

On Tue, Nov 28, 2017 at 7:34 AM, Martin Morgan <
[hidden email]> wrote:

> On 11/26/2017 08:42 PM, jim holtman wrote:
>
>> try this:
>>
>> ##########################################
>>
>> library(dplyr)
>>
>> input <- tribble(
>>    ~station, ~from, ~to, ~record,
>>   "07EA001" ,    1960  ,  1960  , "QMS",
>>   "07EA001"  ,   1961 ,   1970  , "QMC",
>>   "07EA001" ,    1971  ,  1971  , "QMM",
>>   "07EA001" ,    1972  ,  1976  , "QMC",
>>   "07EA001" ,    1977  ,  1983  , "QRC"
>> )
>>
>> result <- input %>%
>>    rowwise() %>%
>>    do(tibble(station = .$station,
>>              year = seq(.$from, .$to),
>>              record = .$record)
>>    )
>>
>> ###########################
>>
>
> In a bit more 'base R' mode I did
>
>   input$year <- with(input, Map(seq, from, to))
>   res0 <- with(input, Map(data.frame, station=station, year=year,
>       record=record))
>    as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>
> resulting in
>
> > as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>    station  year record
>     <fctr> <int> <fctr>
>  1 07EA001  1960    QMS
>  2 07EA001  1961    QMC
>  3 07EA001  1962    QMC
>  4 07EA001  1963    QMC
>  5 07EA001  1964    QMC
>  6 07EA001  1965    QMC
>  7 07EA001  1966    QMC
>  8 07EA001  1967    QMC
>  9 07EA001  1968    QMC
> 10 07EA001  1969    QMC
> # ... with 14 more rows
>
> I though I should have been able to use `tibble` in the second step, but
> that leads to a (cryptic) error
>
> > res0 <- with(input, Map(tibble, station=station, year=year,
> record=record))Error in captureDots(strict = `__quosured`) :
>   the argument has already been evaluated
>
> The 'station' and 'record' columns are factors, so different from the
> original input, but this seems the appropriate data type for theses columns.
>
> It's interesting to compare the 'specialized' knowledge needed for each
> approach -- rowwise(), do(), .$ for tidyverse, with(), do.call(), maybe
> rbind() and Map() for base R.
>
> Martin
>
>
>
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <[hidden email]>
>> wrote:
>>
>> To David W.'s point about lack of a suitable reprex ("reproducible
>>> example"), Bill's solution seems to be for only one station.
>>>
>>> Here is a reprex and modification that I think does what was requested
>>> for
>>> multiple stations, again using base R and data frames, not dplyr and
>>> tibbles.
>>>
>>> First the reprex with **two** stations:
>>>
>>> d <- data.frame( station = rep(c("one","two"),c(5,4)),
>>>>
>>>                 from = c(60,61,71,72,76,60,65,82,83),
>>>                  to = c(60,70,71,76,83,64, 81, 82,83),
>>>                  record = c("A","B","C","B","D","B","B","D","E"))
>>>
>>> d
>>>>
>>>    station from to record
>>> 1     one   60 60      A
>>> 2     one   61 70      B
>>> 3     one   71 71      C
>>> 4     one   72 76      B
>>> 5     one   76 83      D
>>> 6     two   60 64      B
>>> 7     two   65 81      B
>>> 8     two   82 82      D
>>> 9     two   83 83      E
>>>
>>> ## Now the conversion code using base R, especially by():
>>>
>>> out <- by(d, d$station, function(x) with(x, {
>>>>
>>> +    i <- to - from +1
>>> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
>>> + }))
>>>
>>>
>>> out <- data.frame(station =
>>>>
>>> rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)
>>>
>>>
>>> out
>>>>
>>>     station YEAR RECORD
>>> 1      one   60      A
>>> 2      one   61      B
>>> 3      one   62      B
>>> 4      one   63      B
>>> 5      one   64      B
>>> 6      one   65      B
>>> 7      one   66      B
>>> 8      one   67      B
>>> 9      one   68      B
>>> 10     one   69      B
>>> 11     one   70      B
>>> 12     one   71      C
>>> 13     one   72      B
>>> 14     one   73      B
>>> 15     one   74      B
>>> 16     one   75      B
>>> 17     one   76      B
>>> 18     one   76      D
>>> 19     one   77      D
>>> 20     one   78      D
>>> 21     one   79      D
>>> 22     one   80      D
>>> 23     one   81      D
>>> 24     one   82      D
>>> 25     one   83      D
>>> 26     two   60      B
>>> 27     two   61      B
>>> 28     two   62      B
>>> 29     two   63      B
>>> 30     two   64      B
>>> 31     two   65      B
>>> 32     two   66      B
>>> 33     two   67      B
>>> 34     two   68      B
>>> 35     two   69      B
>>> 36     two   70      B
>>> 37     two   71      B
>>> 38     two   72      B
>>> 39     two   73      B
>>> 40     two   74      B
>>> 41     two   75      B
>>> 42     two   76      B
>>> 43     two   77      B
>>> 44     two   78      B
>>> 45     two   79      B
>>> 46     two   80      B
>>> 47     two   81      B
>>> 48     two   82      D
>>> 49     two   83      E
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and
>>> sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
>>> [hidden email]> wrote:
>>>
>>> dplyr may have something for this, but in base R I think the following
>>>>
>>> does
>>>
>>>> what you want.  I've shortened the name of your data set to 'd'.
>>>>
>>>> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
>>>> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
>>>> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>>>>
>>>>
>>>> Bill Dunlap
>>>> TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
>>>> [hidden email]> wrote:
>>>>
>>>> I have a returned tibble of station operational record similar to the
>>>>> following:
>>>>>
>>>>> data.collection
>>>>>>
>>>>> # A tibble: 5 x 4
>>>>>    STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>>>>>             <chr>     <int>   <int>  <chr>
>>>>> 1        07EA001      1960    1960    QMS
>>>>> 2        07EA001      1961    1970    QMC
>>>>> 3        07EA001      1971    1971    QMM
>>>>> 4        07EA001      1972    1976    QMC
>>>>> 5        07EA001      1977    1983    QRC
>>>>>
>>>>> I would like to reshape this to one operational record (row) per year
>>>>>
>>>> per
>>>
>>>> station. Something like:
>>>>>
>>>>> 07EA001              1960      QMS
>>>>> 07EA001              1961      QMC
>>>>> 07EA001              1962      QMC
>>>>> 07EA001              1963      QMC
>>>>> ...
>>>>> 07EA001              1971      QMM
>>>>>
>>>>> Can this be done in dplyr easily?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> David
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>> posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

R help mailing list-2
In reply to this post by Bert Gunter-2
Bert wrote
  ... Bill's solution seems to be for only one station.

No, it works for any number of stations.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Nov 26, 2017 at 11:10 AM, Bert Gunter <[hidden email]>
wrote:

> To David W.'s point about lack of a suitable reprex ("reproducible
> example"), Bill's solution seems to be for only one station.
>
> Here is a reprex and modification that I think does what was requested for
> multiple stations, again using base R and data frames, not dplyr and
> tibbles.
>
> First the reprex with **two** stations:
>
> > d <- data.frame( station = rep(c("one","two"),c(5,4)),
>                from = c(60,61,71,72,76,60,65,82,83),
>                 to = c(60,70,71,76,83,64, 81, 82,83),
>                 record = c("A","B","C","B","D","B","B","D","E"))
>
> > d
>   station from to record
> 1     one   60 60      A
> 2     one   61 70      B
> 3     one   71 71      C
> 4     one   72 76      B
> 5     one   76 83      D
> 6     two   60 64      B
> 7     two   65 81      B
> 8     two   82 82      D
> 9     two   83 83      E
>
> ## Now the conversion code using base R, especially by():
>
> > out <- by(d, d$station, function(x) with(x, {
> +    i <- to - from +1
> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
> + }))
>
>
> > out <- data.frame(station = rep(names(out),sapply(out,nrow)),do.call(rbind,out),
> row.names = NULL)
>
>
> > out
>    station YEAR RECORD
> 1      one   60      A
> 2      one   61      B
> 3      one   62      B
> 4      one   63      B
> 5      one   64      B
> 6      one   65      B
> 7      one   66      B
> 8      one   67      B
> 9      one   68      B
> 10     one   69      B
> 11     one   70      B
> 12     one   71      C
> 13     one   72      B
> 14     one   73      B
> 15     one   74      B
> 16     one   75      B
> 17     one   76      B
> 18     one   76      D
> 19     one   77      D
> 20     one   78      D
> 21     one   79      D
> 22     one   80      D
> 23     one   81      D
> 24     one   82      D
> 25     one   83      D
> 26     two   60      B
> 27     two   61      B
> 28     two   62      B
> 29     two   63      B
> 30     two   64      B
> 31     two   65      B
> 32     two   66      B
> 33     two   67      B
> 34     two   68      B
> 35     two   69      B
> 36     two   70      B
> 37     two   71      B
> 38     two   72      B
> 39     two   73      B
> 40     two   74      B
> 41     two   75      B
> 42     two   76      B
> 43     two   77      B
> 44     two   78      B
> 45     two   79      B
> 46     two   80      B
> 47     two   81      B
> 48     two   82      D
> 49     two   83      E
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
> [hidden email]> wrote:
>
>> dplyr may have something for this, but in base R I think the following
>> does
>> what you want.  I've shortened the name of your data set to 'd'.
>>
>> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
>> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
>> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
>> [hidden email]> wrote:
>>
>> > I have a returned tibble of station operational record similar to the
>> > following:
>> >
>> > > data.collection
>> > # A tibble: 5 x 4
>> >   STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>> >            <chr>     <int>   <int>  <chr>
>> > 1        07EA001      1960    1960    QMS
>> > 2        07EA001      1961    1970    QMC
>> > 3        07EA001      1971    1971    QMM
>> > 4        07EA001      1972    1976    QMC
>> > 5        07EA001      1977    1983    QRC
>> >
>> > I would like to reshape this to one operational record (row) per year
>> per
>> > station. Something like:
>> >
>> > 07EA001              1960      QMS
>> > 07EA001              1961      QMC
>> > 07EA001              1962      QMC
>> > 07EA001              1963      QMC
>> > ...
>> > 07EA001              1971      QMM
>> >
>> > Can this be done in dplyr easily?
>> >
>> > Thanks in advance,
>> >
>> > David
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

Bert Gunter-2
Bill et al.:

Yes, I see it now. Thank you for the correction.

-- Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Nov 28, 2017 at 1:42 PM, William Dunlap <[hidden email]> wrote:

> Bert wrote
>   ... Bill's solution seems to be for only one station.
>
> No, it works for any number of stations.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sun, Nov 26, 2017 at 11:10 AM, Bert Gunter <[hidden email]>
> wrote:
>
>> To David W.'s point about lack of a suitable reprex ("reproducible
>> example"), Bill's solution seems to be for only one station.
>>
>> Here is a reprex and modification that I think does what was requested
>> for multiple stations, again using base R and data frames, not dplyr and
>> tibbles.
>>
>> First the reprex with **two** stations:
>>
>> > d <- data.frame( station = rep(c("one","two"),c(5,4)),
>>                from = c(60,61,71,72,76,60,65,82,83),
>>                 to = c(60,70,71,76,83,64, 81, 82,83),
>>                 record = c("A","B","C","B","D","B","B","D","E"))
>>
>> > d
>>   station from to record
>> 1     one   60 60      A
>> 2     one   61 70      B
>> 3     one   71 71      C
>> 4     one   72 76      B
>> 5     one   76 83      D
>> 6     two   60 64      B
>> 7     two   65 81      B
>> 8     two   82 82      D
>> 9     two   83 83      E
>>
>> ## Now the conversion code using base R, especially by():
>>
>> > out <- by(d, d$station, function(x) with(x, {
>> +    i <- to - from +1
>> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
>> + }))
>>
>>
>> > out <- data.frame(station = rep(names(out),sapply(out,nrow)),do.call(rbind,out),
>> row.names = NULL)
>>
>>
>> > out
>>    station YEAR RECORD
>> 1      one   60      A
>> 2      one   61      B
>> 3      one   62      B
>> 4      one   63      B
>> 5      one   64      B
>> 6      one   65      B
>> 7      one   66      B
>> 8      one   67      B
>> 9      one   68      B
>> 10     one   69      B
>> 11     one   70      B
>> 12     one   71      C
>> 13     one   72      B
>> 14     one   73      B
>> 15     one   74      B
>> 16     one   75      B
>> 17     one   76      B
>> 18     one   76      D
>> 19     one   77      D
>> 20     one   78      D
>> 21     one   79      D
>> 22     one   80      D
>> 23     one   81      D
>> 24     one   82      D
>> 25     one   83      D
>> 26     two   60      B
>> 27     two   61      B
>> 28     two   62      B
>> 29     two   63      B
>> 30     two   64      B
>> 31     two   65      B
>> 32     two   66      B
>> 33     two   67      B
>> 34     two   68      B
>> 35     two   69      B
>> 36     two   70      B
>> 37     two   71      B
>> 38     two   72      B
>> 39     two   73      B
>> 40     two   74      B
>> 41     two   75      B
>> 42     two   76      B
>> 43     two   77      B
>> 44     two   78      B
>> 45     two   79      B
>> 46     two   80      B
>> 47     two   81      B
>> 48     two   82      D
>> 49     two   83      E
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
>> [hidden email]> wrote:
>>
>>> dplyr may have something for this, but in base R I think the following
>>> does
>>> what you want.  I've shortened the name of your data set to 'd'.
>>>
>>> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
>>> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
>>> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
>>> [hidden email]> wrote:
>>>
>>> > I have a returned tibble of station operational record similar to the
>>> > following:
>>> >
>>> > > data.collection
>>> > # A tibble: 5 x 4
>>> >   STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>>> >            <chr>     <int>   <int>  <chr>
>>> > 1        07EA001      1960    1960    QMS
>>> > 2        07EA001      1961    1970    QMC
>>> > 3        07EA001      1971    1971    QMM
>>> > 4        07EA001      1972    1976    QMC
>>> > 5        07EA001      1977    1983    QRC
>>> >
>>> > I would like to reshape this to one operational record (row) per year
>>> per
>>> > station. Something like:
>>> >
>>> > 07EA001              1960      QMS
>>> > 07EA001              1961      QMC
>>> > 07EA001              1962      QMC
>>> > 07EA001              1963      QMC
>>> > ...
>>> > 07EA001              1971      QMM
>>> >
>>> > Can this be done in dplyr easily?
>>> >
>>> > Thanks in advance,
>>> >
>>> > David
>>> >
>>> >         [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide http://www.R-project.org/
>>> > posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

Tóth Dénes
In reply to this post by Michael Lawrence-3
Hi,

A benchmarking study with an additional (data.table-based) solution.
Enjoy! ;)

Cheers,
Denes


--------------------------


## packages ##########################

library(dplyr)
library(data.table)
library(IRanges)
library(microbenchmark)

## prepare example dataset ###########

## use Bert's example, with 2000 stations instead of 2
d_df <- data.frame( station = rep(rep(c("one","two"),c(5,4)), 1000L),
                     from = as.integer(c(60,61,71,72,76,60,65,82,83)),
                     to = as.integer(c(60,70,71,76,83,64, 81, 82,83)),
                     record = c("A","B","C","B","D","B","B","D","E"),
                     stringsAsFactors = FALSE)
stations <- rle(d_df$station)
stations$value <- gsub(
   " ", "0",
   paste0("station", format(1:length(stations$value), width = 6)))
d_df$station <- rep(stations$value, stations$lengths)

## prepare tibble and data.table versions
d_tbl <- as_tibble(d_df)
d_dt <- as.data.table(d_df)

## solutions ##########################

## Bert - by
fun_bert <- function(d) {
   out <- by(
     d, d$station, function(x) with(x, {
       i <- to - from +1
       data.frame(record =rep(record,i),
                  year =sequence(i) -1 + rep(from,i),
                  stringsAsFactors = FALSE)
     }))
   data.frame(station = rep(names(out), sapply(out,nrow)),
              do.call(rbind,out),
              row.names = NULL,
              stringsAsFactors = FALSE)
}

## Bill - transform
fun_bill <- function(d) {
   i <- rep(seq_len(nrow(d)), d$to-d$from+1)
   j <- sequence(d$to-d$from+1)
   transform(d[i,], year=from+j-1, from=NULL, to=NULL)
}

## Michael - IRanges
fun_michael <- function(d) {
   df <- with(d, DataFrame(station, record, year=IRanges(from, to)))
   expand(df, "year")
}

## Jim - dplyr
fun_jim <- function(d) {
   d %>%
     rowwise() %>%
     do(tibble(station = .$station,
               record = .$record,
               year = seq(.$from, .$to))
     )
}

## Martin - Map
fun_martin <- function(d) {
   d$year <- with(d, Map(seq, from, to))
   res0 <- with(d, Map(data.frame,
                       station=station,
                       record=record,
                       year=year,
                       MoreArgs = list(stringsAsFactors = FALSE)))
   do.call(rbind, unname(res0))
}

## Denes - simple data.table
fun_denes <- function(d) {
   out <- d[, .(year = from:to), by = .(station, from, record)]
   out[, from := NULL]
}

## Check equality ################################
all.equal(fun_bill(d_df), fun_bert(d_df),
           check.attributes = FALSE)
all.equal(fun_bill(d_df), fun_martin(d_df),
           check.attributes = FALSE)
all.equal(fun_bill(d_df), as.data.frame(fun_michael(d_df)),
           check.attributes = FALSE)
all.equal(fun_bill(d_df), as.data.frame(fun_denes(d_dt)),
           check.attributes = FALSE)
# Be prepared: this solution is super slow
all.equal(fun_bill(d_df), as.data.frame(fun_jim(d_tbl)),
           check.attributes = FALSE)

## Benchmark #####################################

## Martin
print(system.time(fun_martin(d_df)))

## Bert
print(system.time(fun_bert(d_df)))

## Top 3
print(
   microbenchmark(
     fun_bill(d_df),
     fun_michael(d_df),
     fun_denes(d_dt),
     times = 100L
   )
)


-------------------------

On 11/28/2017 06:49 PM, Michael Lawrence wrote:

> Or with the Bioconductor IRanges package:
>
> df <- with(input, DataFrame(station, year=IRanges(from, to), record))
> expand(df, "year")
>
> DataFrame with 24 rows and 3 columns
>          station     year      record
>      <character> <integer> <character>
> 1       07EA001      1960         QMS
> 2       07EA001      1961         QMC
> 3       07EA001      1962         QMC
> 4       07EA001      1963         QMC
> 5       07EA001      1964         QMC
> ...         ...       ...         ...
> 20      07EA001      1979         QRC
> 21      07EA001      1980         QRC
> 22      07EA001      1981         QRC
> 23      07EA001      1982         QRC
> 24      07EA001      1983         QRC
>
> If you tell the computer more about your data, it can do more things for
> you.
>
> Michael
>
> On Tue, Nov 28, 2017 at 7:34 AM, Martin Morgan <
> [hidden email]> wrote:
>
>> On 11/26/2017 08:42 PM, jim holtman wrote:
>>
>>> try this:
>>>
>>> ##########################################
>>>
>>> library(dplyr)
>>>
>>> input <- tribble(
>>>     ~station, ~from, ~to, ~record,
>>>    "07EA001" ,    1960  ,  1960  , "QMS",
>>>    "07EA001"  ,   1961 ,   1970  , "QMC",
>>>    "07EA001" ,    1971  ,  1971  , "QMM",
>>>    "07EA001" ,    1972  ,  1976  , "QMC",
>>>    "07EA001" ,    1977  ,  1983  , "QRC"
>>> )
>>>
>>> result <- input %>%
>>>     rowwise() %>%
>>>     do(tibble(station = .$station,
>>>               year = seq(.$from, .$to),
>>>               record = .$record)
>>>     )
>>>
>>> ###########################
>>>
>>
>> In a bit more 'base R' mode I did
>>
>>    input$year <- with(input, Map(seq, from, to))
>>    res0 <- with(input, Map(data.frame, station=station, year=year,
>>        record=record))
>>     as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>>
>> resulting in
>>
>>> as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>>     station  year record
>>      <fctr> <int> <fctr>
>>   1 07EA001  1960    QMS
>>   2 07EA001  1961    QMC
>>   3 07EA001  1962    QMC
>>   4 07EA001  1963    QMC
>>   5 07EA001  1964    QMC
>>   6 07EA001  1965    QMC
>>   7 07EA001  1966    QMC
>>   8 07EA001  1967    QMC
>>   9 07EA001  1968    QMC
>> 10 07EA001  1969    QMC
>> # ... with 14 more rows
>>
>> I though I should have been able to use `tibble` in the second step, but
>> that leads to a (cryptic) error
>>
>>> res0 <- with(input, Map(tibble, station=station, year=year,
>> record=record))Error in captureDots(strict = `__quosured`) :
>>    the argument has already been evaluated
>>
>> The 'station' and 'record' columns are factors, so different from the
>> original input, but this seems the appropriate data type for theses columns.
>>
>> It's interesting to compare the 'specialized' knowledge needed for each
>> approach -- rowwise(), do(), .$ for tidyverse, with(), do.call(), maybe
>> rbind() and Map() for base R.
>>
>> Martin
>>
>>
>>
>>>
>>>
>>> Jim Holtman
>>> Data Munger Guru
>>>
>>> What is the problem that you are trying to solve?
>>> Tell me what you want to do, not how you want to do it.
>>>
>>> On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <[hidden email]>
>>> wrote:
>>>
>>> To David W.'s point about lack of a suitable reprex ("reproducible
>>>> example"), Bill's solution seems to be for only one station.
>>>>
>>>> Here is a reprex and modification that I think does what was requested
>>>> for
>>>> multiple stations, again using base R and data frames, not dplyr and
>>>> tibbles.
>>>>
>>>> First the reprex with **two** stations:
>>>>
>>>> d <- data.frame( station = rep(c("one","two"),c(5,4)),
>>>>>
>>>>                  from = c(60,61,71,72,76,60,65,82,83),
>>>>                   to = c(60,70,71,76,83,64, 81, 82,83),
>>>>                   record = c("A","B","C","B","D","B","B","D","E"))
>>>>
>>>> d
>>>>>
>>>>     station from to record
>>>> 1     one   60 60      A
>>>> 2     one   61 70      B
>>>> 3     one   71 71      C
>>>> 4     one   72 76      B
>>>> 5     one   76 83      D
>>>> 6     two   60 64      B
>>>> 7     two   65 81      B
>>>> 8     two   82 82      D
>>>> 9     two   83 83      E
>>>>
>>>> ## Now the conversion code using base R, especially by():
>>>>
>>>> out <- by(d, d$station, function(x) with(x, {
>>>>>
>>>> +    i <- to - from +1
>>>> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
>>>> + }))
>>>>
>>>>
>>>> out <- data.frame(station =
>>>>>
>>>> rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)
>>>>
>>>>
>>>> out
>>>>>
>>>>      station YEAR RECORD
>>>> 1      one   60      A
>>>> 2      one   61      B
>>>> 3      one   62      B
>>>> 4      one   63      B
>>>> 5      one   64      B
>>>> 6      one   65      B
>>>> 7      one   66      B
>>>> 8      one   67      B
>>>> 9      one   68      B
>>>> 10     one   69      B
>>>> 11     one   70      B
>>>> 12     one   71      C
>>>> 13     one   72      B
>>>> 14     one   73      B
>>>> 15     one   74      B
>>>> 16     one   75      B
>>>> 17     one   76      B
>>>> 18     one   76      D
>>>> 19     one   77      D
>>>> 20     one   78      D
>>>> 21     one   79      D
>>>> 22     one   80      D
>>>> 23     one   81      D
>>>> 24     one   82      D
>>>> 25     one   83      D
>>>> 26     two   60      B
>>>> 27     two   61      B
>>>> 28     two   62      B
>>>> 29     two   63      B
>>>> 30     two   64      B
>>>> 31     two   65      B
>>>> 32     two   66      B
>>>> 33     two   67      B
>>>> 34     two   68      B
>>>> 35     two   69      B
>>>> 36     two   70      B
>>>> 37     two   71      B
>>>> 38     two   72      B
>>>> 39     two   73      B
>>>> 40     two   74      B
>>>> 41     two   75      B
>>>> 42     two   76      B
>>>> 43     two   77      B
>>>> 44     two   78      B
>>>> 45     two   79      B
>>>> 46     two   80      B
>>>> 47     two   81      B
>>>> 48     two   82      D
>>>> 49     two   83      E
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and
>>>> sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
>>>> [hidden email]> wrote:
>>>>
>>>> dplyr may have something for this, but in base R I think the following
>>>>>
>>>> does
>>>>
>>>>> what you want.  I've shortened the name of your data set to 'd'.
>>>>>
>>>>> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
>>>>> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
>>>>> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>>>>>
>>>>>
>>>>> Bill Dunlap
>>>>> TIBCO Software
>>>>> wdunlap tibco.com
>>>>>
>>>>> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
>>>>> [hidden email]> wrote:
>>>>>
>>>>> I have a returned tibble of station operational record similar to the
>>>>>> following:
>>>>>>
>>>>>> data.collection
>>>>>>>
>>>>>> # A tibble: 5 x 4
>>>>>>     STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>>>>>>              <chr>     <int>   <int>  <chr>
>>>>>> 1        07EA001      1960    1960    QMS
>>>>>> 2        07EA001      1961    1970    QMC
>>>>>> 3        07EA001      1971    1971    QMM
>>>>>> 4        07EA001      1972    1976    QMC
>>>>>> 5        07EA001      1977    1983    QRC
>>>>>>
>>>>>> I would like to reshape this to one operational record (row) per year
>>>>>>
>>>>> per
>>>>
>>>>> station. Something like:
>>>>>>
>>>>>> 07EA001              1960      QMS
>>>>>> 07EA001              1961      QMC
>>>>>> 07EA001              1962      QMC
>>>>>> 07EA001              1963      QMC
>>>>>> ...
>>>>>> 07EA001              1971      QMM
>>>>>>
>>>>>> Can this be done in dplyr easily?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>           [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>> posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>           [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>> posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>           [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> This email message may contain legally privileged and/or...{{dropped:2}}
>>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Dr. Tóth Dénes ügyvezető
Kogentum Kft.
Tel.: 06-30-2583723
Web: www.kogentum.hu

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

Martin Morgan-3
On 11/29/2017 04:15 PM, Tóth Dénes wrote:
> Hi,
>
> A benchmarking study with an additional (data.table-based) solution.

I don't think speed is the right benchmark (I do agree that correctness
is!).

For the R-help list, maybe something about least specialized R knowledge
required would be appropriate? I'd say there were some 'hard' solutions
-- Michael (deep understanding of Bioconductor and IRanges), Toth (deep
understanding of data.table), Jim (at least for me moderate
understanding of dplyr,especially the .$ notation; a simpler dplyr
answer might have moved this response out of the 'difficult' category,
especially given the familiarity of the OP with dplyr). I'd vote for
Bill's as requiring the least specialized knowledge of R (though the +/-
1 indexing is an easy thing to get wrong).

A different criteria might be reuse across analysis scenarios. Bill
seems to win here again, since the principles are very general and at
least moderately efficient (both Bert and Martin's solutions are
essentially R-level iterations and have poor scalability, as
demonstrated in the microbenchmarks; Bill's is mostly vectorized).
Certainly data.table, dplyr, and IRanges are extremely useful within the
confines of the problem domains they address.

Martin

> Enjoy! ;)
>
> Cheers,
> Denes
>
>
> --------------------------
>
>
> ## packages ##########################
>
> library(dplyr)
> library(data.table)
> library(IRanges)
> library(microbenchmark)
>
> ## prepare example dataset ###########
>
> ## use Bert's example, with 2000 stations instead of 2
> d_df <- data.frame( station = rep(rep(c("one","two"),c(5,4)), 1000L),
>                      from = as.integer(c(60,61,71,72,76,60,65,82,83)),
>                      to = as.integer(c(60,70,71,76,83,64, 81, 82,83)),
>                      record = c("A","B","C","B","D","B","B","D","E"),
>                      stringsAsFactors = FALSE)
> stations <- rle(d_df$station)
> stations$value <- gsub(
>    " ", "0",
>    paste0("station", format(1:length(stations$value), width = 6)))
> d_df$station <- rep(stations$value, stations$lengths)
>
> ## prepare tibble and data.table versions
> d_tbl <- as_tibble(d_df)
> d_dt <- as.data.table(d_df)
>
> ## solutions ##########################
>
> ## Bert - by
> fun_bert <- function(d) {
>    out <- by(
>      d, d$station, function(x) with(x, {
>        i <- to - from +1
>        data.frame(record =rep(record,i),
>                   year =sequence(i) -1 + rep(from,i),
>                   stringsAsFactors = FALSE)
>      }))
>    data.frame(station = rep(names(out), sapply(out,nrow)),
>               do.call(rbind,out),
>               row.names = NULL,
>               stringsAsFactors = FALSE)
> }
>
> ## Bill - transform
> fun_bill <- function(d) {
>    i <- rep(seq_len(nrow(d)), d$to-d$from+1)
>    j <- sequence(d$to-d$from+1)
>    transform(d[i,], year=from+j-1, from=NULL, to=NULL)
> }
>
> ## Michael - IRanges
> fun_michael <- function(d) {
>    df <- with(d, DataFrame(station, record, year=IRanges(from, to)))
>    expand(df, "year")
> }
>
> ## Jim - dplyr
> fun_jim <- function(d) {
>    d %>%
>      rowwise() %>%
>      do(tibble(station = .$station,
>                record = .$record,
>                year = seq(.$from, .$to))
>      )
> }
>
> ## Martin - Map
> fun_martin <- function(d) {
>    d$year <- with(d, Map(seq, from, to))
>    res0 <- with(d, Map(data.frame,
>                        station=station,
>                        record=record,
>                        year=year,
>                        MoreArgs = list(stringsAsFactors = FALSE)))
>    do.call(rbind, unname(res0))
> }
>
> ## Denes - simple data.table
> fun_denes <- function(d) {
>    out <- d[, .(year = from:to), by = .(station, from, record)]
>    out[, from := NULL]
> }
>
> ## Check equality ################################
> all.equal(fun_bill(d_df), fun_bert(d_df),
>            check.attributes = FALSE)
> all.equal(fun_bill(d_df), fun_martin(d_df),
>            check.attributes = FALSE)
> all.equal(fun_bill(d_df), as.data.frame(fun_michael(d_df)),
>            check.attributes = FALSE)
> all.equal(fun_bill(d_df), as.data.frame(fun_denes(d_dt)),
>            check.attributes = FALSE)
> # Be prepared: this solution is super slow
> all.equal(fun_bill(d_df), as.data.frame(fun_jim(d_tbl)),
>            check.attributes = FALSE)
>
> ## Benchmark #####################################
>
> ## Martin
> print(system.time(fun_martin(d_df)))
>
> ## Bert
> print(system.time(fun_bert(d_df)))
>
> ## Top 3
> print(
>    microbenchmark(
>      fun_bill(d_df),
>      fun_michael(d_df),
>      fun_denes(d_dt),
>      times = 100L
>    )
> )
>
>
> -------------------------
>
> On 11/28/2017 06:49 PM, Michael Lawrence wrote:
>> Or with the Bioconductor IRanges package:
>>
>> df <- with(input, DataFrame(station, year=IRanges(from, to), record))
>> expand(df, "year")
>>
>> DataFrame with 24 rows and 3 columns
>>          station     year      record
>>      <character> <integer> <character>
>> 1       07EA001      1960         QMS
>> 2       07EA001      1961         QMC
>> 3       07EA001      1962         QMC
>> 4       07EA001      1963         QMC
>> 5       07EA001      1964         QMC
>> ...         ...       ...         ...
>> 20      07EA001      1979         QRC
>> 21      07EA001      1980         QRC
>> 22      07EA001      1981         QRC
>> 23      07EA001      1982         QRC
>> 24      07EA001      1983         QRC
>>
>> If you tell the computer more about your data, it can do more things for
>> you.
>>
>> Michael
>>
>> On Tue, Nov 28, 2017 at 7:34 AM, Martin Morgan <
>> [hidden email]> wrote:
>>
>>> On 11/26/2017 08:42 PM, jim holtman wrote:
>>>
>>>> try this:
>>>>
>>>> ##########################################
>>>>
>>>> library(dplyr)
>>>>
>>>> input <- tribble(
>>>>     ~station, ~from, ~to, ~record,
>>>>    "07EA001" ,    1960  ,  1960  , "QMS",
>>>>    "07EA001"  ,   1961 ,   1970  , "QMC",
>>>>    "07EA001" ,    1971  ,  1971  , "QMM",
>>>>    "07EA001" ,    1972  ,  1976  , "QMC",
>>>>    "07EA001" ,    1977  ,  1983  , "QRC"
>>>> )
>>>>
>>>> result <- input %>%
>>>>     rowwise() %>%
>>>>     do(tibble(station = .$station,
>>>>               year = seq(.$from, .$to),
>>>>               record = .$record)
>>>>     )
>>>>
>>>> ###########################
>>>>
>>>
>>> In a bit more 'base R' mode I did
>>>
>>>    input$year <- with(input, Map(seq, from, to))
>>>    res0 <- with(input, Map(data.frame, station=station, year=year,
>>>        record=record))
>>>     as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>>>
>>> resulting in
>>>
>>>> as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>>>     station  year record
>>>      <fctr> <int> <fctr>
>>>   1 07EA001  1960    QMS
>>>   2 07EA001  1961    QMC
>>>   3 07EA001  1962    QMC
>>>   4 07EA001  1963    QMC
>>>   5 07EA001  1964    QMC
>>>   6 07EA001  1965    QMC
>>>   7 07EA001  1966    QMC
>>>   8 07EA001  1967    QMC
>>>   9 07EA001  1968    QMC
>>> 10 07EA001  1969    QMC
>>> # ... with 14 more rows
>>>
>>> I though I should have been able to use `tibble` in the second step, but
>>> that leads to a (cryptic) error
>>>
>>>> res0 <- with(input, Map(tibble, station=station, year=year,
>>> record=record))Error in captureDots(strict = `__quosured`) :
>>>    the argument has already been evaluated
>>>
>>> The 'station' and 'record' columns are factors, so different from the
>>> original input, but this seems the appropriate data type for theses
>>> columns.
>>>
>>> It's interesting to compare the 'specialized' knowledge needed for each
>>> approach -- rowwise(), do(), .$ for tidyverse, with(), do.call(), maybe
>>> rbind() and Map() for base R.
>>>
>>> Martin
>>>
>>>
>>>
>>>>
>>>>
>>>> Jim Holtman
>>>> Data Munger Guru
>>>>
>>>> What is the problem that you are trying to solve?
>>>> Tell me what you want to do, not how you want to do it.
>>>>
>>>> On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <[hidden email]>
>>>> wrote:
>>>>
>>>> To David W.'s point about lack of a suitable reprex ("reproducible
>>>>> example"), Bill's solution seems to be for only one station.
>>>>>
>>>>> Here is a reprex and modification that I think does what was requested
>>>>> for
>>>>> multiple stations, again using base R and data frames, not dplyr and
>>>>> tibbles.
>>>>>
>>>>> First the reprex with **two** stations:
>>>>>
>>>>> d <- data.frame( station = rep(c("one","two"),c(5,4)),
>>>>>>
>>>>>                  from = c(60,61,71,72,76,60,65,82,83),
>>>>>                   to = c(60,70,71,76,83,64, 81, 82,83),
>>>>>                   record = c("A","B","C","B","D","B","B","D","E"))
>>>>>
>>>>> d
>>>>>>
>>>>>     station from to record
>>>>> 1     one   60 60      A
>>>>> 2     one   61 70      B
>>>>> 3     one   71 71      C
>>>>> 4     one   72 76      B
>>>>> 5     one   76 83      D
>>>>> 6     two   60 64      B
>>>>> 7     two   65 81      B
>>>>> 8     two   82 82      D
>>>>> 9     two   83 83      E
>>>>>
>>>>> ## Now the conversion code using base R, especially by():
>>>>>
>>>>> out <- by(d, d$station, function(x) with(x, {
>>>>>>
>>>>> +    i <- to - from +1
>>>>> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD
>>>>> =rep(record,i))
>>>>> + }))
>>>>>
>>>>>
>>>>> out <- data.frame(station =
>>>>>>
>>>>> rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)
>>>>>
>>>>>
>>>>> out
>>>>>>
>>>>>      station YEAR RECORD
>>>>> 1      one   60      A
>>>>> 2      one   61      B
>>>>> 3      one   62      B
>>>>> 4      one   63      B
>>>>> 5      one   64      B
>>>>> 6      one   65      B
>>>>> 7      one   66      B
>>>>> 8      one   67      B
>>>>> 9      one   68      B
>>>>> 10     one   69      B
>>>>> 11     one   70      B
>>>>> 12     one   71      C
>>>>> 13     one   72      B
>>>>> 14     one   73      B
>>>>> 15     one   74      B
>>>>> 16     one   75      B
>>>>> 17     one   76      B
>>>>> 18     one   76      D
>>>>> 19     one   77      D
>>>>> 20     one   78      D
>>>>> 21     one   79      D
>>>>> 22     one   80      D
>>>>> 23     one   81      D
>>>>> 24     one   82      D
>>>>> 25     one   83      D
>>>>> 26     two   60      B
>>>>> 27     two   61      B
>>>>> 28     two   62      B
>>>>> 29     two   63      B
>>>>> 30     two   64      B
>>>>> 31     two   65      B
>>>>> 32     two   66      B
>>>>> 33     two   67      B
>>>>> 34     two   68      B
>>>>> 35     two   69      B
>>>>> 36     two   70      B
>>>>> 37     two   71      B
>>>>> 38     two   72      B
>>>>> 39     two   73      B
>>>>> 40     two   74      B
>>>>> 41     two   75      B
>>>>> 42     two   76      B
>>>>> 43     two   77      B
>>>>> 44     two   78      B
>>>>> 45     two   79      B
>>>>> 46     two   80      B
>>>>> 47     two   81      B
>>>>> 48     two   82      D
>>>>> 49     two   83      E
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming along
>>>>> and
>>>>> sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
>>>>> [hidden email]> wrote:
>>>>>
>>>>> dplyr may have something for this, but in base R I think the following
>>>>>>
>>>>> does
>>>>>
>>>>>> what you want.  I've shortened the name of your data set to 'd'.
>>>>>>
>>>>>> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
>>>>>> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
>>>>>> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>>>>>>
>>>>>>
>>>>>> Bill Dunlap
>>>>>> TIBCO Software
>>>>>> wdunlap tibco.com
>>>>>>
>>>>>> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>> I have a returned tibble of station operational record similar to the
>>>>>>> following:
>>>>>>>
>>>>>>> data.collection
>>>>>>>>
>>>>>>> # A tibble: 5 x 4
>>>>>>>     STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>>>>>>>              <chr>     <int>   <int>  <chr>
>>>>>>> 1        07EA001      1960    1960    QMS
>>>>>>> 2        07EA001      1961    1970    QMC
>>>>>>> 3        07EA001      1971    1971    QMM
>>>>>>> 4        07EA001      1972    1976    QMC
>>>>>>> 5        07EA001      1977    1983    QRC
>>>>>>>
>>>>>>> I would like to reshape this to one operational record (row) per
>>>>>>> year
>>>>>>>
>>>>>> per
>>>>>
>>>>>> station. Something like:
>>>>>>>
>>>>>>> 07EA001              1960      QMS
>>>>>>> 07EA001              1961      QMC
>>>>>>> 07EA001              1962      QMC
>>>>>>> 07EA001              1963      QMC
>>>>>>> ...
>>>>>>> 07EA001              1971      QMM
>>>>>>>
>>>>>>> Can this be done in dplyr easily?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>>> posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>>
>>>>>>           [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>> posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>           [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>> posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>> This email message may contain legally privileged and/or...{{dropped:2}}
>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>


This email message may contain legally privileged and/or...{{dropped:2}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

Tóth Dénes
Hi Martin,

On 11/29/2017 10:46 PM, Martin Morgan wrote:
> On 11/29/2017 04:15 PM, Tóth Dénes wrote:
>> Hi,
>>
>> A benchmarking study with an additional (data.table-based) solution.
>
> I don't think speed is the right benchmark (I do agree that correctness
> is!).

Well, agree, and sorry for the wording. It was really just an exercise
and not a full evaluation of the approaches. When I read the avalanche
of solutions neither of which mentioning data.table (my first choice for
data.frame-manipulations), I became curious how a one-liner data.table
code performs against the other solutions in terms of speed and
readability.
Second, I quite often have the feeling that dplyr is extremely overused
among novice (and sometimes even experienced) R users nowadays. This is
unfortunate, as the present example also illustrates.

Regards,
Denes

>
> For the R-help list, maybe something about least specialized R knowledge
> required would be appropriate? I'd say there were some 'hard' solutions
> -- Michael (deep understanding of Bioconductor and IRanges), Toth (deep
> understanding of data.table), Jim (at least for me moderate
> understanding of dplyr,especially the .$ notation; a simpler dplyr
> answer might have moved this response out of the 'difficult' category,
> especially given the familiarity of the OP with dplyr). I'd vote for
> Bill's as requiring the least specialized knowledge of R (though the +/-
> 1 indexing is an easy thing to get wrong).
>
> A different criteria might be reuse across analysis scenarios. Bill
> seems to win here again, since the principles are very general and at
> least moderately efficient (both Bert and Martin's solutions are
> essentially R-level iterations and have poor scalability, as
> demonstrated in the microbenchmarks; Bill's is mostly vectorized).
> Certainly data.table, dplyr, and IRanges are extremely useful within the
> confines of the problem domains they address.
>
> Martin
>
>> Enjoy! ;)
>>
>> Cheers,
>> Denes
>>
>>
>> --------------------------
>>
>>
>> ## packages ##########################
>>
>> library(dplyr)
>> library(data.table)
>> library(IRanges)
>> library(microbenchmark)
>>
>> ## prepare example dataset ###########
>>
>> ## use Bert's example, with 2000 stations instead of 2
>> d_df <- data.frame( station = rep(rep(c("one","two"),c(5,4)), 1000L),
>>                      from = as.integer(c(60,61,71,72,76,60,65,82,83)),
>>                      to = as.integer(c(60,70,71,76,83,64, 81, 82,83)),
>>                      record = c("A","B","C","B","D","B","B","D","E"),
>>                      stringsAsFactors = FALSE)
>> stations <- rle(d_df$station)
>> stations$value <- gsub(
>>    " ", "0",
>>    paste0("station", format(1:length(stations$value), width = 6)))
>> d_df$station <- rep(stations$value, stations$lengths)
>>
>> ## prepare tibble and data.table versions
>> d_tbl <- as_tibble(d_df)
>> d_dt <- as.data.table(d_df)
>>
>> ## solutions ##########################
>>
>> ## Bert - by
>> fun_bert <- function(d) {
>>    out <- by(
>>      d, d$station, function(x) with(x, {
>>        i <- to - from +1
>>        data.frame(record =rep(record,i),
>>                   year =sequence(i) -1 + rep(from,i),
>>                   stringsAsFactors = FALSE)
>>      }))
>>    data.frame(station = rep(names(out), sapply(out,nrow)),
>>               do.call(rbind,out),
>>               row.names = NULL,
>>               stringsAsFactors = FALSE)
>> }
>>
>> ## Bill - transform
>> fun_bill <- function(d) {
>>    i <- rep(seq_len(nrow(d)), d$to-d$from+1)
>>    j <- sequence(d$to-d$from+1)
>>    transform(d[i,], year=from+j-1, from=NULL, to=NULL)
>> }
>>
>> ## Michael - IRanges
>> fun_michael <- function(d) {
>>    df <- with(d, DataFrame(station, record, year=IRanges(from, to)))
>>    expand(df, "year")
>> }
>>
>> ## Jim - dplyr
>> fun_jim <- function(d) {
>>    d %>%
>>      rowwise() %>%
>>      do(tibble(station = .$station,
>>                record = .$record,
>>                year = seq(.$from, .$to))
>>      )
>> }
>>
>> ## Martin - Map
>> fun_martin <- function(d) {
>>    d$year <- with(d, Map(seq, from, to))
>>    res0 <- with(d, Map(data.frame,
>>                        station=station,
>>                        record=record,
>>                        year=year,
>>                        MoreArgs = list(stringsAsFactors = FALSE)))
>>    do.call(rbind, unname(res0))
>> }
>>
>> ## Denes - simple data.table
>> fun_denes <- function(d) {
>>    out <- d[, .(year = from:to), by = .(station, from, record)]
>>    out[, from := NULL]
>> }
>>
>> ## Check equality ################################
>> all.equal(fun_bill(d_df), fun_bert(d_df),
>>            check.attributes = FALSE)
>> all.equal(fun_bill(d_df), fun_martin(d_df),
>>            check.attributes = FALSE)
>> all.equal(fun_bill(d_df), as.data.frame(fun_michael(d_df)),
>>            check.attributes = FALSE)
>> all.equal(fun_bill(d_df), as.data.frame(fun_denes(d_dt)),
>>            check.attributes = FALSE)
>> # Be prepared: this solution is super slow
>> all.equal(fun_bill(d_df), as.data.frame(fun_jim(d_tbl)),
>>            check.attributes = FALSE)
>>
>> ## Benchmark #####################################
>>
>> ## Martin
>> print(system.time(fun_martin(d_df)))
>>
>> ## Bert
>> print(system.time(fun_bert(d_df)))
>>
>> ## Top 3
>> print(
>>    microbenchmark(
>>      fun_bill(d_df),
>>      fun_michael(d_df),
>>      fun_denes(d_dt),
>>      times = 100L
>>    )
>> )
>>
>>
>> -------------------------
>>
>> On 11/28/2017 06:49 PM, Michael Lawrence wrote:
>>> Or with the Bioconductor IRanges package:
>>>
>>> df <- with(input, DataFrame(station, year=IRanges(from, to), record))
>>> expand(df, "year")
>>>
>>> DataFrame with 24 rows and 3 columns
>>>          station     year      record
>>>      <character> <integer> <character>
>>> 1       07EA001      1960         QMS
>>> 2       07EA001      1961         QMC
>>> 3       07EA001      1962         QMC
>>> 4       07EA001      1963         QMC
>>> 5       07EA001      1964         QMC
>>> ...         ...       ...         ...
>>> 20      07EA001      1979         QRC
>>> 21      07EA001      1980         QRC
>>> 22      07EA001      1981         QRC
>>> 23      07EA001      1982         QRC
>>> 24      07EA001      1983         QRC
>>>
>>> If you tell the computer more about your data, it can do more things for
>>> you.
>>>
>>> Michael
>>>
>>> On Tue, Nov 28, 2017 at 7:34 AM, Martin Morgan <
>>> [hidden email]> wrote:
>>>
>>>> On 11/26/2017 08:42 PM, jim holtman wrote:
>>>>
>>>>> try this:
>>>>>
>>>>> ##########################################
>>>>>
>>>>> library(dplyr)
>>>>>
>>>>> input <- tribble(
>>>>>     ~station, ~from, ~to, ~record,
>>>>>    "07EA001" ,    1960  ,  1960  , "QMS",
>>>>>    "07EA001"  ,   1961 ,   1970  , "QMC",
>>>>>    "07EA001" ,    1971  ,  1971  , "QMM",
>>>>>    "07EA001" ,    1972  ,  1976  , "QMC",
>>>>>    "07EA001" ,    1977  ,  1983  , "QRC"
>>>>> )
>>>>>
>>>>> result <- input %>%
>>>>>     rowwise() %>%
>>>>>     do(tibble(station = .$station,
>>>>>               year = seq(.$from, .$to),
>>>>>               record = .$record)
>>>>>     )
>>>>>
>>>>> ###########################
>>>>>
>>>>
>>>> In a bit more 'base R' mode I did
>>>>
>>>>    input$year <- with(input, Map(seq, from, to))
>>>>    res0 <- with(input, Map(data.frame, station=station, year=year,
>>>>        record=record))
>>>>     as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>>>>
>>>> resulting in
>>>>
>>>>> as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>>>>     station  year record
>>>>      <fctr> <int> <fctr>
>>>>   1 07EA001  1960    QMS
>>>>   2 07EA001  1961    QMC
>>>>   3 07EA001  1962    QMC
>>>>   4 07EA001  1963    QMC
>>>>   5 07EA001  1964    QMC
>>>>   6 07EA001  1965    QMC
>>>>   7 07EA001  1966    QMC
>>>>   8 07EA001  1967    QMC
>>>>   9 07EA001  1968    QMC
>>>> 10 07EA001  1969    QMC
>>>> # ... with 14 more rows
>>>>
>>>> I though I should have been able to use `tibble` in the second step,
>>>> but
>>>> that leads to a (cryptic) error
>>>>
>>>>> res0 <- with(input, Map(tibble, station=station, year=year,
>>>> record=record))Error in captureDots(strict = `__quosured`) :
>>>>    the argument has already been evaluated
>>>>
>>>> The 'station' and 'record' columns are factors, so different from the
>>>> original input, but this seems the appropriate data type for theses
>>>> columns.
>>>>
>>>> It's interesting to compare the 'specialized' knowledge needed for each
>>>> approach -- rowwise(), do(), .$ for tidyverse, with(), do.call(), maybe
>>>> rbind() and Map() for base R.
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> Jim Holtman
>>>>> Data Munger Guru
>>>>>
>>>>> What is the problem that you are trying to solve?
>>>>> Tell me what you want to do, not how you want to do it.
>>>>>
>>>>> On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> To David W.'s point about lack of a suitable reprex ("reproducible
>>>>>> example"), Bill's solution seems to be for only one station.
>>>>>>
>>>>>> Here is a reprex and modification that I think does what was
>>>>>> requested
>>>>>> for
>>>>>> multiple stations, again using base R and data frames, not dplyr and
>>>>>> tibbles.
>>>>>>
>>>>>> First the reprex with **two** stations:
>>>>>>
>>>>>> d <- data.frame( station = rep(c("one","two"),c(5,4)),
>>>>>>>
>>>>>>                  from = c(60,61,71,72,76,60,65,82,83),
>>>>>>                   to = c(60,70,71,76,83,64, 81, 82,83),
>>>>>>                   record = c("A","B","C","B","D","B","B","D","E"))
>>>>>>
>>>>>> d
>>>>>>>
>>>>>>     station from to record
>>>>>> 1     one   60 60      A
>>>>>> 2     one   61 70      B
>>>>>> 3     one   71 71      C
>>>>>> 4     one   72 76      B
>>>>>> 5     one   76 83      D
>>>>>> 6     two   60 64      B
>>>>>> 7     two   65 81      B
>>>>>> 8     two   82 82      D
>>>>>> 9     two   83 83      E
>>>>>>
>>>>>> ## Now the conversion code using base R, especially by():
>>>>>>
>>>>>> out <- by(d, d$station, function(x) with(x, {
>>>>>>>
>>>>>> +    i <- to - from +1
>>>>>> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD
>>>>>> =rep(record,i))
>>>>>> + }))
>>>>>>
>>>>>>
>>>>>> out <- data.frame(station =
>>>>>>>
>>>>>> rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names =
>>>>>> NULL)
>>>>>>
>>>>>>
>>>>>> out
>>>>>>>
>>>>>>      station YEAR RECORD
>>>>>> 1      one   60      A
>>>>>> 2      one   61      B
>>>>>> 3      one   62      B
>>>>>> 4      one   63      B
>>>>>> 5      one   64      B
>>>>>> 6      one   65      B
>>>>>> 7      one   66      B
>>>>>> 8      one   67      B
>>>>>> 9      one   68      B
>>>>>> 10     one   69      B
>>>>>> 11     one   70      B
>>>>>> 12     one   71      C
>>>>>> 13     one   72      B
>>>>>> 14     one   73      B
>>>>>> 15     one   74      B
>>>>>> 16     one   75      B
>>>>>> 17     one   76      B
>>>>>> 18     one   76      D
>>>>>> 19     one   77      D
>>>>>> 20     one   78      D
>>>>>> 21     one   79      D
>>>>>> 22     one   80      D
>>>>>> 23     one   81      D
>>>>>> 24     one   82      D
>>>>>> 25     one   83      D
>>>>>> 26     two   60      B
>>>>>> 27     two   61      B
>>>>>> 28     two   62      B
>>>>>> 29     two   63      B
>>>>>> 30     two   64      B
>>>>>> 31     two   65      B
>>>>>> 32     two   66      B
>>>>>> 33     two   67      B
>>>>>> 34     two   68      B
>>>>>> 35     two   69      B
>>>>>> 36     two   70      B
>>>>>> 37     two   71      B
>>>>>> 38     two   72      B
>>>>>> 39     two   73      B
>>>>>> 40     two   74      B
>>>>>> 41     two   75      B
>>>>>> 42     two   76      B
>>>>>> 43     two   77      B
>>>>>> 44     two   78      B
>>>>>> 45     two   79      B
>>>>>> 46     two   80      B
>>>>>> 47     two   81      B
>>>>>> 48     two   82      D
>>>>>> 49     two   83      E
>>>>>>
>>>>>> Cheers,
>>>>>> Bert
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that people keep coming
>>>>>> along
>>>>>> and
>>>>>> sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>>
>>>>>> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>> dplyr may have something for this, but in base R I think the
>>>>>> following
>>>>>>>
>>>>>> does
>>>>>>
>>>>>>> what you want.  I've shortened the name of your data set to 'd'.
>>>>>>>
>>>>>>> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
>>>>>>> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
>>>>>>> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>>>>>>>
>>>>>>>
>>>>>>> Bill Dunlap
>>>>>>> TIBCO Software
>>>>>>> wdunlap tibco.com
>>>>>>>
>>>>>>> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>> I have a returned tibble of station operational record similar to
>>>>>>> the
>>>>>>>> following:
>>>>>>>>
>>>>>>>> data.collection
>>>>>>>>>
>>>>>>>> # A tibble: 5 x 4
>>>>>>>>     STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>>>>>>>>              <chr>     <int>   <int>  <chr>
>>>>>>>> 1        07EA001      1960    1960    QMS
>>>>>>>> 2        07EA001      1961    1970    QMC
>>>>>>>> 3        07EA001      1971    1971    QMM
>>>>>>>> 4        07EA001      1972    1976    QMC
>>>>>>>> 5        07EA001      1977    1983    QRC
>>>>>>>>
>>>>>>>> I would like to reshape this to one operational record (row) per
>>>>>>>> year
>>>>>>>>
>>>>>>> per
>>>>>>
>>>>>>> station. Something like:
>>>>>>>>
>>>>>>>> 07EA001              1960      QMS
>>>>>>>> 07EA001              1961      QMC
>>>>>>>> 07EA001              1962      QMC
>>>>>>>> 07EA001              1963      QMC
>>>>>>>> ...
>>>>>>>> 07EA001              1971      QMM
>>>>>>>>
>>>>>>>> Can this be done in dplyr easily?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>>>> posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>
>>>>>>>>
>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>>> posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>>
>>>>>>           [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>> posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>>> ng-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>
>>>> This email message may contain legally privileged
>>>> and/or...{{dropped:2}}
>>>>
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee
> or agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>

--
Dr. Tóth Dénes ügyvezető
Kogentum Kft.
Tel.: 06-30-2583723
Web: www.kogentum.hu

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: dplyr - add/expand rows

Martin Morgan-3
On 11/29/2017 05:47 PM, Tóth Dénes wrote:

> Hi Martin,
>
> On 11/29/2017 10:46 PM, Martin Morgan wrote:
>> On 11/29/2017 04:15 PM, Tóth Dénes wrote:
>>> Hi,
>>>
>>> A benchmarking study with an additional (data.table-based) solution.
>>
>> I don't think speed is the right benchmark (I do agree that
>> correctness is!).
>
> Well, agree, and sorry for the wording. It was really just an exercise
> and not a full evaluation of the approaches. When I read the avalanche
> of solutions neither of which mentioning data.table (my first choice for
> data.frame-manipulations), I became curious how a one-liner data.table
> code performs against the other solutions in terms of speed and
> readability.
> Second, I quite often have the feeling that dplyr is extremely overused
> among novice (and sometimes even experienced) R users nowadays. This is
> unfortunate, as the present example also illustrates.

Another solution is Bill's approach and dplyr's implementation (adding
the 1L to keep integers integers!)

fun_bill1 <- function(d) {
   i <- rep(seq_len(nrow(d)), d$to - d$from + 1L)
   j <- sequence(d$to - d$from + 1L)
   ## d[i,] %>% mutate(year = from + j - 1L, from = NULL, to = NULL)
   mutate(d[i,], year = from + j - 1L, from = NULL, to = NULL)
}

which is competitive with IRanges and data.table (the more dplyr-ish?
solution

   d[i, ] %>% mutate(year = from + j - 1L) %>%
       select(station, record, year))

has intermediate performance) and might appeal to those introduced to R
through dplyr but wanting more base R knowledge, and vice versa. I think
if dplyr introduces new users to R, or exposes R users to new approaches
for working with data, that's great!

Martin


>
> Regards,
> Denes
>
>>
>> For the R-help list, maybe something about least specialized R
>> knowledge required would be appropriate? I'd say there were some
>> 'hard' solutions -- Michael (deep understanding of Bioconductor and
>> IRanges), Toth (deep understanding of data.table), Jim (at least for
>> me moderate understanding of dplyr,especially the .$ notation; a
>> simpler dplyr answer might have moved this response out of the
>> 'difficult' category, especially given the familiarity of the OP with
>> dplyr). I'd vote for Bill's as requiring the least specialized
>> knowledge of R (though the +/- 1 indexing is an easy thing to get wrong).
>>
>> A different criteria might be reuse across analysis scenarios. Bill
>> seems to win here again, since the principles are very general and at
>> least moderately efficient (both Bert and Martin's solutions are
>> essentially R-level iterations and have poor scalability, as
>> demonstrated in the microbenchmarks; Bill's is mostly vectorized).
>> Certainly data.table, dplyr, and IRanges are extremely useful within
>> the confines of the problem domains they address.
>>
>> Martin
>>
>>> Enjoy! ;)
>>>
>>> Cheers,
>>> Denes
>>>
>>>
>>> --------------------------
>>>
>>>
>>> ## packages ##########################
>>>
>>> library(dplyr)
>>> library(data.table)
>>> library(IRanges)
>>> library(microbenchmark)
>>>
>>> ## prepare example dataset ###########
>>>
>>> ## use Bert's example, with 2000 stations instead of 2
>>> d_df <- data.frame( station = rep(rep(c("one","two"),c(5,4)), 1000L),
>>>                      from = as.integer(c(60,61,71,72,76,60,65,82,83)),
>>>                      to = as.integer(c(60,70,71,76,83,64, 81, 82,83)),
>>>                      record = c("A","B","C","B","D","B","B","D","E"),
>>>                      stringsAsFactors = FALSE)
>>> stations <- rle(d_df$station)
>>> stations$value <- gsub(
>>>    " ", "0",
>>>    paste0("station", format(1:length(stations$value), width = 6)))
>>> d_df$station <- rep(stations$value, stations$lengths)
>>>
>>> ## prepare tibble and data.table versions
>>> d_tbl <- as_tibble(d_df)
>>> d_dt <- as.data.table(d_df)
>>>
>>> ## solutions ##########################
>>>
>>> ## Bert - by
>>> fun_bert <- function(d) {
>>>    out <- by(
>>>      d, d$station, function(x) with(x, {
>>>        i <- to - from +1
>>>        data.frame(record =rep(record,i),
>>>                   year =sequence(i) -1 + rep(from,i),
>>>                   stringsAsFactors = FALSE)
>>>      }))
>>>    data.frame(station = rep(names(out), sapply(out,nrow)),
>>>               do.call(rbind,out),
>>>               row.names = NULL,
>>>               stringsAsFactors = FALSE)
>>> }
>>>
>>> ## Bill - transform
>>> fun_bill <- function(d) {
>>>    i <- rep(seq_len(nrow(d)), d$to-d$from+1)
>>>    j <- sequence(d$to-d$from+1)
>>>    transform(d[i,], year=from+j-1, from=NULL, to=NULL)
>>> }
>>>
>>> ## Michael - IRanges
>>> fun_michael <- function(d) {
>>>    df <- with(d, DataFrame(station, record, year=IRanges(from, to)))
>>>    expand(df, "year")
>>> }
>>>
>>> ## Jim - dplyr
>>> fun_jim <- function(d) {
>>>    d %>%
>>>      rowwise() %>%
>>>      do(tibble(station = .$station,
>>>                record = .$record,
>>>                year = seq(.$from, .$to))
>>>      )
>>> }
>>>
>>> ## Martin - Map
>>> fun_martin <- function(d) {
>>>    d$year <- with(d, Map(seq, from, to))
>>>    res0 <- with(d, Map(data.frame,
>>>                        station=station,
>>>                        record=record,
>>>                        year=year,
>>>                        MoreArgs = list(stringsAsFactors = FALSE)))
>>>    do.call(rbind, unname(res0))
>>> }
>>>
>>> ## Denes - simple data.table
>>> fun_denes <- function(d) {
>>>    out <- d[, .(year = from:to), by = .(station, from, record)]
>>>    out[, from := NULL]
>>> }
>>>
>>> ## Check equality ################################
>>> all.equal(fun_bill(d_df), fun_bert(d_df),
>>>            check.attributes = FALSE)
>>> all.equal(fun_bill(d_df), fun_martin(d_df),
>>>            check.attributes = FALSE)
>>> all.equal(fun_bill(d_df), as.data.frame(fun_michael(d_df)),
>>>            check.attributes = FALSE)
>>> all.equal(fun_bill(d_df), as.data.frame(fun_denes(d_dt)),
>>>            check.attributes = FALSE)
>>> # Be prepared: this solution is super slow
>>> all.equal(fun_bill(d_df), as.data.frame(fun_jim(d_tbl)),
>>>            check.attributes = FALSE)
>>>
>>> ## Benchmark #####################################
>>>
>>> ## Martin
>>> print(system.time(fun_martin(d_df)))
>>>
>>> ## Bert
>>> print(system.time(fun_bert(d_df)))
>>>
>>> ## Top 3
>>> print(
>>>    microbenchmark(
>>>      fun_bill(d_df),
>>>      fun_michael(d_df),
>>>      fun_denes(d_dt),
>>>      times = 100L
>>>    )
>>> )
>>>
>>>
>>> -------------------------
>>>
>>> On 11/28/2017 06:49 PM, Michael Lawrence wrote:
>>>> Or with the Bioconductor IRanges package:
>>>>
>>>> df <- with(input, DataFrame(station, year=IRanges(from, to), record))
>>>> expand(df, "year")
>>>>
>>>> DataFrame with 24 rows and 3 columns
>>>>          station     year      record
>>>>      <character> <integer> <character>
>>>> 1       07EA001      1960         QMS
>>>> 2       07EA001      1961         QMC
>>>> 3       07EA001      1962         QMC
>>>> 4       07EA001      1963         QMC
>>>> 5       07EA001      1964         QMC
>>>> ...         ...       ...         ...
>>>> 20      07EA001      1979         QRC
>>>> 21      07EA001      1980         QRC
>>>> 22      07EA001      1981         QRC
>>>> 23      07EA001      1982         QRC
>>>> 24      07EA001      1983         QRC
>>>>
>>>> If you tell the computer more about your data, it can do more things
>>>> for
>>>> you.
>>>>
>>>> Michael
>>>>
>>>> On Tue, Nov 28, 2017 at 7:34 AM, Martin Morgan <
>>>> [hidden email]> wrote:
>>>>
>>>>> On 11/26/2017 08:42 PM, jim holtman wrote:
>>>>>
>>>>>> try this:
>>>>>>
>>>>>> ##########################################
>>>>>>
>>>>>> library(dplyr)
>>>>>>
>>>>>> input <- tribble(
>>>>>>     ~station, ~from, ~to, ~record,
>>>>>>    "07EA001" ,    1960  ,  1960  , "QMS",
>>>>>>    "07EA001"  ,   1961 ,   1970  , "QMC",
>>>>>>    "07EA001" ,    1971  ,  1971  , "QMM",
>>>>>>    "07EA001" ,    1972  ,  1976  , "QMC",
>>>>>>    "07EA001" ,    1977  ,  1983  , "QRC"
>>>>>> )
>>>>>>
>>>>>> result <- input %>%
>>>>>>     rowwise() %>%
>>>>>>     do(tibble(station = .$station,
>>>>>>               year = seq(.$from, .$to),
>>>>>>               record = .$record)
>>>>>>     )
>>>>>>
>>>>>> ###########################
>>>>>>
>>>>>
>>>>> In a bit more 'base R' mode I did
>>>>>
>>>>>    input$year <- with(input, Map(seq, from, to))
>>>>>    res0 <- with(input, Map(data.frame, station=station, year=year,
>>>>>        record=record))
>>>>>     as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>>>>>
>>>>> resulting in
>>>>>
>>>>>> as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
>>>>>     station  year record
>>>>>      <fctr> <int> <fctr>
>>>>>   1 07EA001  1960    QMS
>>>>>   2 07EA001  1961    QMC
>>>>>   3 07EA001  1962    QMC
>>>>>   4 07EA001  1963    QMC
>>>>>   5 07EA001  1964    QMC
>>>>>   6 07EA001  1965    QMC
>>>>>   7 07EA001  1966    QMC
>>>>>   8 07EA001  1967    QMC
>>>>>   9 07EA001  1968    QMC
>>>>> 10 07EA001  1969    QMC
>>>>> # ... with 14 more rows
>>>>>
>>>>> I though I should have been able to use `tibble` in the second
>>>>> step, but
>>>>> that leads to a (cryptic) error
>>>>>
>>>>>> res0 <- with(input, Map(tibble, station=station, year=year,
>>>>> record=record))Error in captureDots(strict = `__quosured`) :
>>>>>    the argument has already been evaluated
>>>>>
>>>>> The 'station' and 'record' columns are factors, so different from the
>>>>> original input, but this seems the appropriate data type for theses
>>>>> columns.
>>>>>
>>>>> It's interesting to compare the 'specialized' knowledge needed for
>>>>> each
>>>>> approach -- rowwise(), do(), .$ for tidyverse, with(), do.call(),
>>>>> maybe
>>>>> rbind() and Map() for base R.
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Jim Holtman
>>>>>> Data Munger Guru
>>>>>>
>>>>>> What is the problem that you are trying to solve?
>>>>>> Tell me what you want to do, not how you want to do it.
>>>>>>
>>>>>> On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>> To David W.'s point about lack of a suitable reprex ("reproducible
>>>>>>> example"), Bill's solution seems to be for only one station.
>>>>>>>
>>>>>>> Here is a reprex and modification that I think does what was
>>>>>>> requested
>>>>>>> for
>>>>>>> multiple stations, again using base R and data frames, not dplyr and
>>>>>>> tibbles.
>>>>>>>
>>>>>>> First the reprex with **two** stations:
>>>>>>>
>>>>>>> d <- data.frame( station = rep(c("one","two"),c(5,4)),
>>>>>>>>
>>>>>>>                  from = c(60,61,71,72,76,60,65,82,83),
>>>>>>>                   to = c(60,70,71,76,83,64, 81, 82,83),
>>>>>>>                   record = c("A","B","C","B","D","B","B","D","E"))
>>>>>>>
>>>>>>> d
>>>>>>>>
>>>>>>>     station from to record
>>>>>>> 1     one   60 60      A
>>>>>>> 2     one   61 70      B
>>>>>>> 3     one   71 71      C
>>>>>>> 4     one   72 76      B
>>>>>>> 5     one   76 83      D
>>>>>>> 6     two   60 64      B
>>>>>>> 7     two   65 81      B
>>>>>>> 8     two   82 82      D
>>>>>>> 9     two   83 83      E
>>>>>>>
>>>>>>> ## Now the conversion code using base R, especially by():
>>>>>>>
>>>>>>> out <- by(d, d$station, function(x) with(x, {
>>>>>>>>
>>>>>>> +    i <- to - from +1
>>>>>>> +    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD
>>>>>>> =rep(record,i))
>>>>>>> + }))
>>>>>>>
>>>>>>>
>>>>>>> out <- data.frame(station =
>>>>>>>>
>>>>>>> rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names =
>>>>>>> NULL)
>>>>>>>
>>>>>>>
>>>>>>> out
>>>>>>>>
>>>>>>>      station YEAR RECORD
>>>>>>> 1      one   60      A
>>>>>>> 2      one   61      B
>>>>>>> 3      one   62      B
>>>>>>> 4      one   63      B
>>>>>>> 5      one   64      B
>>>>>>> 6      one   65      B
>>>>>>> 7      one   66      B
>>>>>>> 8      one   67      B
>>>>>>> 9      one   68      B
>>>>>>> 10     one   69      B
>>>>>>> 11     one   70      B
>>>>>>> 12     one   71      C
>>>>>>> 13     one   72      B
>>>>>>> 14     one   73      B
>>>>>>> 15     one   74      B
>>>>>>> 16     one   75      B
>>>>>>> 17     one   76      B
>>>>>>> 18     one   76      D
>>>>>>> 19     one   77      D
>>>>>>> 20     one   78      D
>>>>>>> 21     one   79      D
>>>>>>> 22     one   80      D
>>>>>>> 23     one   81      D
>>>>>>> 24     one   82      D
>>>>>>> 25     one   83      D
>>>>>>> 26     two   60      B
>>>>>>> 27     two   61      B
>>>>>>> 28     two   62      B
>>>>>>> 29     two   63      B
>>>>>>> 30     two   64      B
>>>>>>> 31     two   65      B
>>>>>>> 32     two   66      B
>>>>>>> 33     two   67      B
>>>>>>> 34     two   68      B
>>>>>>> 35     two   69      B
>>>>>>> 36     two   70      B
>>>>>>> 37     two   71      B
>>>>>>> 38     two   72      B
>>>>>>> 39     two   73      B
>>>>>>> 40     two   74      B
>>>>>>> 41     two   75      B
>>>>>>> 42     two   76      B
>>>>>>> 43     two   77      B
>>>>>>> 44     two   78      B
>>>>>>> 45     two   79      B
>>>>>>> 46     two   80      B
>>>>>>> 47     two   81      B
>>>>>>> 48     two   82      D
>>>>>>> 49     two   83      E
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Bert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Bert Gunter
>>>>>>>
>>>>>>> "The trouble with having an open mind is that people keep coming
>>>>>>> along
>>>>>>> and
>>>>>>> sticking things into it."
>>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>>>
>>>>>>> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>> dplyr may have something for this, but in base R I think the
>>>>>>> following
>>>>>>>>
>>>>>>> does
>>>>>>>
>>>>>>>> what you want.  I've shortened the name of your data set to 'd'.
>>>>>>>>
>>>>>>>> i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
>>>>>>>> j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
>>>>>>>> transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
>>>>>>>>
>>>>>>>>
>>>>>>>> Bill Dunlap
>>>>>>>> TIBCO Software
>>>>>>>> wdunlap tibco.com
>>>>>>>>
>>>>>>>> On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
>>>>>>>> [hidden email]> wrote:
>>>>>>>>
>>>>>>>> I have a returned tibble of station operational record similar
>>>>>>>> to the
>>>>>>>>> following:
>>>>>>>>>
>>>>>>>>> data.collection
>>>>>>>>>>
>>>>>>>>> # A tibble: 5 x 4
>>>>>>>>>     STATION_NUMBER YEAR_FROM YEAR_TO RECORD
>>>>>>>>>              <chr>     <int>   <int>  <chr>
>>>>>>>>> 1        07EA001      1960    1960    QMS
>>>>>>>>> 2        07EA001      1961    1970    QMC
>>>>>>>>> 3        07EA001      1971    1971    QMM
>>>>>>>>> 4        07EA001      1972    1976    QMC
>>>>>>>>> 5        07EA001      1977    1983    QRC
>>>>>>>>>
>>>>>>>>> I would like to reshape this to one operational record (row)
>>>>>>>>> per year
>>>>>>>>>
>>>>>>>> per
>>>>>>>
>>>>>>>> station. Something like:
>>>>>>>>>
>>>>>>>>> 07EA001              1960      QMS
>>>>>>>>> 07EA001              1961      QMC
>>>>>>>>> 07EA001              1962      QMC
>>>>>>>>> 07EA001              1963      QMC
>>>>>>>>> ...
>>>>>>>>> 07EA001              1971      QMM
>>>>>>>>>
>>>>>>>>> Can this be done in dplyr easily?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>>>>> posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>>>> posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>
>>>>>>>>
>>>>>>>           [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>>>> posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>>>> ng-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>
>>>>> This email message may contain legally privileged
>>>>> and/or...{{dropped:2}}
>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>>> ng-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>>
>> This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the
>> employee or agent responsible for the delivery of this message to the
>> intended recipient(s), you are hereby notified that any disclosure,
>> copying, distribution, or use of this email message is prohibited.  If
>> you have received this message in error, please notify the sender
>> immediately by e-mail and delete this email message from your
>> computer. Thank you.
>>
>


This email message may contain legally privileged and/or...{{dropped:2}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.