ddply (or other suitable solution) question

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

ddply (or other suitable solution) question

R help mailing list-2
Dear All,

I have data frame:
set.seed(123.456)
df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
                read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
                int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
                z=rnorm(13,1,5),
                y=rnorm(13,1,5))

what I would like to achieve (as best as I see it now) is to create multiple lists (and lists within lists using the data in df) that would be based on the groups in the ID column ("top level of list") and "join together" each line item within the group followed by the next line item ("bottom level list"), so would look like this for 

[[ID=1]]
[[1]][[1]]
  ID read int        z        y
  1    1   1 5.188935 5.107905
  1    1   1 1.766866 4.443201
[[ID=2]]
[[2]][[1]]  ID read int         z         y
  2    0   0 -4.690685 3.7695883
  2    1   0  7.269075 0.6904414[[ID=2]]
[[2]][[2]]  ID read int        z          y
  2    1   0 7.269075  0.6904414
  2    1   0 3.132321 -0.5298133[[ID=3]]
[[3]][[1]]  ID read int          z         y
  3    1   1 -0.4753574 -0.902355
  3    0   1  5.4756283 -2.473535
[[ID=3]]
[[3]][[2]]
  3    0   1 5.475628 -2.47353489
  3    0   0 5.390667 -0.03958639


hoping example clear enough... all our help is appreciated,

thanks,



Andras 

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ddply (or other suitable solution) question

Jeremie Juste
Andras Farkas via R-help <[hidden email]> writes:

Hello,

 set.seed(123.456)
 
 df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
 read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
 int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
 z=rnorm(13,1,5),
 y=rnorm(13,1,5))

May this will suffice?

lapply(unique(df$ID),function(x) df[df$ID==x,])



HTH,

Jeremie

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ddply (or other suitable solution) question

Bert Gunter-2
In reply to this post by R help mailing list-2
What if there is only one read in the id?


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Sep 13, 2018 at 12:11 PM Andras Farkas via R-help
<[hidden email]> wrote:

>
> Dear All,
>
> I have data frame:
> set.seed(123.456)
> df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
>                 read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
>                 int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
>                 z=rnorm(13,1,5),
>                 y=rnorm(13,1,5))
>
> what I would like to achieve (as best as I see it now) is to create multiple lists (and lists within lists using the data in df) that would be based on the groups in the ID column ("top level of list") and "join together" each line item within the group followed by the next line item ("bottom level list"), so would look like this for
>
> [[ID=1]]
> [[1]][[1]]
>   ID read int        z        y
>   1    1   1 5.188935 5.107905
>   1    1   1 1.766866 4.443201
> [[ID=2]]
> [[2]][[1]]  ID read int         z         y
>   2    0   0 -4.690685 3.7695883
>   2    1   0  7.269075 0.6904414[[ID=2]]
> [[2]][[2]]  ID read int        z          y
>   2    1   0 7.269075  0.6904414
>   2    1   0 3.132321 -0.5298133[[ID=3]]
> [[3]][[1]]  ID read int          z         y
>   3    1   1 -0.4753574 -0.902355
>   3    0   1  5.4756283 -2.473535
> [[ID=3]]
> [[3]][[2]]
>   3    0   1 5.475628 -2.47353489
>   3    0   0 5.390667 -0.03958639
>
>
> hoping example clear enough... all our help is appreciated,
>
> thanks,
>
>
>
> Andras
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ddply (or other suitable solution) question

Bert Gunter-2
Mod my earlier question, it seems that you just want to replicate all
rows within an id if there more than 2 rows. If this is incorrect,
ignore the rest of this post.

Otherwise...

(I assume the data frame is listed in ID order, whatever that is)

set.seed(123.456)
df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
                read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
                int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
                z=rnorm(13,1,5),
                y=rnorm(13,1,5))

yielded on my Mac and R version 3.5.1

> df
   ID read int          z           y
1   1    1   1 -1.8023782  1.55341358
2   1    1   1 -0.1508874 -1.77920567
3   2    0   0  8.7935416  9.93456568
4   2    1   0  1.3525420  3.48925239
5   2    1   0  1.6464387 -8.83308578
6   3    1   1  9.5753249  4.50677951
7   3    0   1  3.3045810 -1.36395704
8   3    0   0 -5.3253062 -4.33911853
9   3    0   0 -2.4342643 -0.08987457
10  4    1   1 -1.2283099 -4.13002224
11  4    0   1  7.1204090 -2.64445615
12  5    0   1  2.7990691 -2.12519634
13  5    0   1  3.0038573 -7.43346655

## The following doubles up the rows by ID
> ix <- tapply(seq_len(nrow(df)),df$ID,
+              function(x){
+                 lenx <- length(x)
+                 if(lenx > 2)
+                    c(x[1],rep(x[2]:x[lenx-1],e=2),x[lenx])
+                 else x
+              }
+    )
> ix
$`1`
[1] 1 2

$`2`
[1] 3 4 4 5

$`3`
[1] 6 7 7 8 8 9

$`4`
[1] 10 11

$`5`
[1] 12 13

## now use the ix list to break up df:

> lapply(ix, function(i)df[i,])
$`1`
  ID read int          z         y
1  1    1   1 -1.8023782  1.553414
2  1    1   1 -0.1508874 -1.779206

$`2`
    ID read int        z         y
3    2    0   0 8.793542  9.934566
4    2    1   0 1.352542  3.489252
4.1  2    1   0 1.352542  3.489252
5    2    1   0 1.646439 -8.833086

$`3`
    ID read int         z           y
6    3    1   1  9.575325  4.50677951
7    3    0   1  3.304581 -1.36395704
7.1  3    0   1  3.304581 -1.36395704
8    3    0   0 -5.325306 -4.33911853
8.1  3    0   0 -5.325306 -4.33911853
9    3    0   0 -2.434264 -0.08987457

$`4`
   ID read int         z         y
10  4    1   1 -1.228310 -4.130022
11  4    0   1  7.120409 -2.644456

$`5`
   ID read int        z         y
12  5    0   1 2.799069 -2.125196
13  5    0   1 3.003857 -7.433467

I leave it to you to modify the lapply() function to break up each id
data frame into sublists of pairs if that is what you wish to do.
Assuming again that this is actually what you want.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Sep 13, 2018 at 1:40 PM Bert Gunter <[hidden email]> wrote:

>
> What if there is only one read in the id?
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Thu, Sep 13, 2018 at 12:11 PM Andras Farkas via R-help
> <[hidden email]> wrote:
> >
> > Dear All,
> >
> > I have data frame:
> > set.seed(123.456)
> > df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
> >                 read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
> >                 int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
> >                 z=rnorm(13,1,5),
> >                 y=rnorm(13,1,5))
> >
> > what I would like to achieve (as best as I see it now) is to create multiple lists (and lists within lists using the data in df) that would be based on the groups in the ID column ("top level of list") and "join together" each line item within the group followed by the next line item ("bottom level list"), so would look like this for
> >
> > [[ID=1]]
> > [[1]][[1]]
> >   ID read int        z        y
> >   1    1   1 5.188935 5.107905
> >   1    1   1 1.766866 4.443201
> > [[ID=2]]
> > [[2]][[1]]  ID read int         z         y
> >   2    0   0 -4.690685 3.7695883
> >   2    1   0  7.269075 0.6904414[[ID=2]]
> > [[2]][[2]]  ID read int        z          y
> >   2    1   0 7.269075  0.6904414
> >   2    1   0 3.132321 -0.5298133[[ID=3]]
> > [[3]][[1]]  ID read int          z         y
> >   3    1   1 -0.4753574 -0.902355
> >   3    0   1  5.4756283 -2.473535
> > [[ID=3]]
> > [[3]][[2]]
> >   3    0   1 5.475628 -2.47353489
> >   3    0   0 5.390667 -0.03958639
> >
> >
> > hoping example clear enough... all our help is appreciated,
> >
> > thanks,
> >
> >
> >
> > Andras
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ddply (or other suitable solution) question

Jeff Newmiller
In reply to this post by R help mailing list-2
The input example seems explicit enough, but I get confused understanding your desired output. Can you create an example data structure in your global environment "by hand" and use dput to give it to us?

On September 13, 2018 12:11:21 PM PDT, Andras Farkas via R-help <[hidden email]> wrote:

>Dear All,
>
>I have data frame:
>set.seed(123.456)
>df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
>                read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
>                int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
>                z=rnorm(13,1,5),
>                y=rnorm(13,1,5))
>
>what I would like to achieve (as best as I see it now) is to create
>multiple lists (and lists within lists using the data in df) that would
>be based on the groups in the ID column ("top level of list") and "join
>together" each line item within the group followed by the next line
>item ("bottom level list"), so would look like this for 
>
>[[ID=1]]
>[[1]][[1]]
>  ID read int        z        y
>  1    1   1 5.188935 5.107905
>  1    1   1 1.766866 4.443201
>[[ID=2]]
>[[2]][[1]]  ID read int         z         y
>  2    0   0 -4.690685 3.7695883
>  2    1   0  7.269075 0.6904414[[ID=2]]
>[[2]][[2]]  ID read int        z          y
>  2    1   0 7.269075  0.6904414
>  2    1   0 3.132321 -0.5298133[[ID=3]]
>[[3]][[1]]  ID read int          z         y
>  3    1   1 -0.4753574 -0.902355
>  3    0   1  5.4756283 -2.473535
>[[ID=3]]
>[[3]][[2]]
>  3    0   1 5.475628 -2.47353489
>  3    0   0 5.390667 -0.03958639
>
>
>hoping example clear enough... all our help is appreciated,
>
>thanks,
>
>
>
>Andras 
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: ddply (or other suitable solution) question

R help mailing list-2
In reply to this post by Bert Gunter-2
thank you all, Bert's idea will get it done... good question also re what if 1 row: have a separate plan for that... Anyhow, finishing up Bert's lines with 
z<-lapply(ix, function(i)   df[i,])
lapply(z, function(x) split(x, rep(1:ceiling(nrow(x)/2), each=2)[1:nrow(x)]))


seems to do what I need,
thanks again...

Andras 

    On Thursday, September 13, 2018, 5:16:54 PM EDT, Bert Gunter <[hidden email]> wrote:  
 
 Mod my earlier question, it seems that you just want to replicate all
rows within an id if there more than 2 rows. If this is incorrect,
ignore the rest of this post.

Otherwise...

(I assume the data frame is listed in ID order, whatever that is)

set.seed(123.456)
df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
                read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
                int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
                z=rnorm(13,1,5),
                y=rnorm(13,1,5))

yielded on my Mac and R version 3.5.1

> df
  ID read int          z          y
1  1    1  1 -1.8023782  1.55341358
2  1    1  1 -0.1508874 -1.77920567
3  2    0  0  8.7935416  9.93456568
4  2    1  0  1.3525420  3.48925239
5  2    1  0  1.6464387 -8.83308578
6  3    1  1  9.5753249  4.50677951
7  3    0  1  3.3045810 -1.36395704
8  3    0  0 -5.3253062 -4.33911853
9  3    0  0 -2.4342643 -0.08987457
10  4    1  1 -1.2283099 -4.13002224
11  4    0  1  7.1204090 -2.64445615
12  5    0  1  2.7990691 -2.12519634
13  5    0  1  3.0038573 -7.43346655

## The following doubles up the rows by ID
> ix <- tapply(seq_len(nrow(df)),df$ID,
+              function(x){
+                lenx <- length(x)
+                if(lenx > 2)
+                    c(x[1],rep(x[2]:x[lenx-1],e=2),x[lenx])
+                else x
+              }
+    )
> ix
$`1`
[1] 1 2

$`2`
[1] 3 4 4 5

$`3`
[1] 6 7 7 8 8 9

$`4`
[1] 10 11

$`5`
[1] 12 13

## now use the ix list to break up df:

> lapply(ix, function(i)df[i,])
$`1`
  ID read int          z        y
1  1    1  1 -1.8023782  1.553414
2  1    1  1 -0.1508874 -1.779206

$`2`
    ID read int        z        y
3    2    0  0 8.793542  9.934566
4    2    1  0 1.352542  3.489252
4.1  2    1  0 1.352542  3.489252
5    2    1  0 1.646439 -8.833086

$`3`
    ID read int        z          y
6    3    1  1  9.575325  4.50677951
7    3    0  1  3.304581 -1.36395704
7.1  3    0  1  3.304581 -1.36395704
8    3    0  0 -5.325306 -4.33911853
8.1  3    0  0 -5.325306 -4.33911853
9    3    0  0 -2.434264 -0.08987457

$`4`
  ID read int        z        y
10  4    1  1 -1.228310 -4.130022
11  4    0  1  7.120409 -2.644456

$`5`
  ID read int        z        y
12  5    0  1 2.799069 -2.125196
13  5    0  1 3.003857 -7.433467

I leave it to you to modify the lapply() function to break up each id
data frame into sublists of pairs if that is what you wish to do.
Assuming again that this is actually what you want.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Sep 13, 2018 at 1:40 PM Bert Gunter <[hidden email]> wrote:

>
> What if there is only one read in the id?
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Thu, Sep 13, 2018 at 12:11 PM Andras Farkas via R-help
> <[hidden email]> wrote:
> >
> > Dear All,
> >
> > I have data frame:
> > set.seed(123.456)
> > df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
> >                read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
> >                int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
> >                z=rnorm(13,1,5),
> >                y=rnorm(13,1,5))
> >
> > what I would like to achieve (as best as I see it now) is to create multiple lists (and lists within lists using the data in df) that would be based on the groups in the ID column ("top level of list") and "join together" each line item within the group followed by the next line item ("bottom level list"), so would look like this for
> >
> > [[ID=1]]
> > [[1]][[1]]
> >  ID read int        z        y
> >  1    1  1 5.188935 5.107905
> >  1    1  1 1.766866 4.443201
> > [[ID=2]]
> > [[2]][[1]]  ID read int        z        y
> >  2    0  0 -4.690685 3.7695883
> >  2    1  0  7.269075 0.6904414[[ID=2]]
> > [[2]][[2]]  ID read int        z          y
> >  2    1  0 7.269075  0.6904414
> >  2    1  0 3.132321 -0.5298133[[ID=3]]
> > [[3]][[1]]  ID read int          z        y
> >  3    1  1 -0.4753574 -0.902355
> >  3    0  1  5.4756283 -2.473535
> > [[ID=3]]
> > [[3]][[2]]
> >  3    0  1 5.475628 -2.47353489
> >  3    0  0 5.390667 -0.03958639
> >
> >
> > hoping example clear enough... all our help is appreciated,
> >
> > thanks,
> >
> >
> >
> > Andras
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.  
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.