rle with data.table - is it possible?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

rle with data.table - is it possible?

Beejai
I'm trying to use both these packages and wondering whether they are possible...

To make this simple, my ultimate goal is determine long stretches of
1s, but I want to do this within groups (hence using the data.table as
I use the "set key" option.  However, I'm I'm not having much luck
making this possible.

For example, for simplistic sake, I have the following data:

Dad Mum Child Group
AA RR RA A
AA RR RR A
AA AA AA B
AA AA AA B
RA AA RR B
RR AA RR B
AA AA AA B
AA AA RA C
AA AA RA C
AA RR RA  C

And the following code which I know works

hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]

hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]

hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]

However, I wish to do the above code by Group (though this file is
millions of rows long and groups will be larger but just wanted to
simply the example).

I did something like this but of course I got an error:

LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]

The reason being as I want to eventually have something like this:

Dad Mum Child Group sumdad summum sumchild
AA RR RA A 2 2 0
AA RR RR A 2 2 1
AA AA AA B 4 5 5
AA AA AA B 4 5 5
RA AA RR B 0 5 5
RR AA RR B 4 5 5
AA AA AA B 4 5 5
AA AA RA C 3 3 0
AA AA RA C 3 3 0
AA RR RA  C 3 3 0

That is, I would like to have the specific counts next to what I'm
consecutively counting per group.  So for Group A for dad there are 2
AAs,  there are two RRs for mum but only 1 AA or RR for the child and
that is RR (so the 1 is next to the RR and not the RA).

Can this be done?

K.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Jeff Newmiller
I do not understand the value of using the rle function in your
description, but the code below appears to produce the table you want.

Note that better support for the data.table package might be found at
stackexchange as the documentation specifies.

x <- read.table( text=
"Dad Mum Child Group
AA RR RA A
AA RR RR A
AA AA AA B
AA AA AA B
RA AA RR B
RR AA RR B
AA AA AA B
AA AA RA C
AA AA RA C
AA RR RA C
", header=TRUE, stringsAsFactors=FALSE )

library(data.table)
DT <- data.table( x )
DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
DT[ , sumdad := 0L ]
DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
DT[ , cdad := NULL ]
DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
DT[ , summum := 0L ]
DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
DT[ , cmum := NULL ]
DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
DT[ , sumchild := 0L ]
DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
DT[ , cchild := NULL ]

>DT
     Dad Mum Child Group sumdad summum sumchild
  1:  AA  RR    RA     A      2      2        0
  2:  AA  RR    RR     A      2      2        1
  3:  AA  AA    AA     B      4      5        5
  4:  AA  AA    AA     B      4      5        5
  5:  RA  AA    RR     B      0      5        5
  6:  RR  AA    RR     B      4      5        5
  7:  AA  AA    AA     B      4      5        5
  8:  AA  AA    RA     C      3      3        0
  9:  AA  AA    RA     C      3      3        0
10:  AA  RR    RA     C      3      3        0

On Tue, 30 Dec 2014, Kate Ignatius wrote:

> I'm trying to use both these packages and wondering whether they are possible...
>
> To make this simple, my ultimate goal is determine long stretches of
> 1s, but I want to do this within groups (hence using the data.table as
> I use the "set key" option.  However, I'm I'm not having much luck
> making this possible.
>
> For example, for simplistic sake, I have the following data:
>
> Dad Mum Child Group
> AA RR RA A
> AA RR RR A
> AA AA AA B
> AA AA AA B
> RA AA RR B
> RR AA RR B
> AA AA AA B
> AA AA RA C
> AA AA RA C
> AA RR RA  C
>
> And the following code which I know works
>
> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>
> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>
> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>
> However, I wish to do the above code by Group (though this file is
> millions of rows long and groups will be larger but just wanted to
> simply the example).
>
> I did something like this but of course I got an error:
>
> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>
> The reason being as I want to eventually have something like this:
>
> Dad Mum Child Group sumdad summum sumchild
> AA RR RA A 2 2 0
> AA RR RR A 2 2 1
> AA AA AA B 4 5 5
> AA AA AA B 4 5 5
> RA AA RR B 0 5 5
> RR AA RR B 4 5 5
> AA AA AA B 4 5 5
> AA AA RA C 3 3 0
> AA AA RA C 3 3 0
> AA RR RA  C 3 3 0
>
> That is, I would like to have the specific counts next to what I'm
> consecutively counting per group.  So for Group A for dad there are 2
> AAs,  there are two RRs for mum but only 1 AA or RR for the child and
> that is RR (so the 1 is next to the RR and not the RA).
>
> Can this be done?
>
> K.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Beejai
Is it possible to add the following code or similar in data.table:

childseg<-0
x:=sumchild <-0
span<-rle(x)$lengths[rle(x)$values==TRUE
childseg[x]<-rep(seq_along(span), times = span)
childseg[childseg == 0]<-''

I was hoping to do this code by Group for mum, dad and
child.  The problem I'm having is with the
span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
be added to data.table.

[Previous email had incorrect code]

On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
<[hidden email]> wrote:

> I do not understand the value of using the rle function in your description,
> but the code below appears to produce the table you want.
>
> Note that better support for the data.table package might be found at
> stackexchange as the documentation specifies.
>
> x <- read.table( text=
> "Dad Mum Child Group
> AA RR RA A
> AA RR RR A
> AA AA AA B
> AA AA AA B
> RA AA RR B
> RR AA RR B
> AA AA AA B
> AA AA RA C
> AA AA RA C
> AA RR RA C
> ", header=TRUE, stringsAsFactors=FALSE )
>
> library(data.table)
> DT <- data.table( x )
> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
> DT[ , sumdad := 0L ]
> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
> DT[ , cdad := NULL ]
> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
> DT[ , summum := 0L ]
> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
> DT[ , cmum := NULL ]
> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
> DT[ , sumchild := 0L ]
> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
> DT[ , cchild := NULL ]
>
>> DT
>
>     Dad Mum Child Group sumdad summum sumchild
>  1:  AA  RR    RA     A      2      2        0
>  2:  AA  RR    RR     A      2      2        1
>  3:  AA  AA    AA     B      4      5        5
>  4:  AA  AA    AA     B      4      5        5
>  5:  RA  AA    RR     B      0      5        5
>  6:  RR  AA    RR     B      4      5        5
>  7:  AA  AA    AA     B      4      5        5
>  8:  AA  AA    RA     C      3      3        0
>  9:  AA  AA    RA     C      3      3        0
> 10:  AA  RR    RA     C      3      3        0
>
>
> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>
>> I'm trying to use both these packages and wondering whether they are
>> possible...
>>
>> To make this simple, my ultimate goal is determine long stretches of
>> 1s, but I want to do this within groups (hence using the data.table as
>> I use the "set key" option.  However, I'm I'm not having much luck
>> making this possible.
>>
>> For example, for simplistic sake, I have the following data:
>>
>> Dad Mum Child Group
>> AA RR RA A
>> AA RR RR A
>> AA AA AA B
>> AA AA AA B
>> RA AA RR B
>> RR AA RR B
>> AA AA AA B
>> AA AA RA C
>> AA AA RA C
>> AA RR RA  C
>>
>> And the following code which I know works
>>
>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>
>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>
>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>
>> However, I wish to do the above code by Group (though this file is
>> millions of rows long and groups will be larger but just wanted to
>> simply the example).
>>
>> I did something like this but of course I got an error:
>>
>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>
>> The reason being as I want to eventually have something like this:
>>
>> Dad Mum Child Group sumdad summum sumchild
>> AA RR RA A 2 2 0
>> AA RR RR A 2 2 1
>> AA AA AA B 4 5 5
>> AA AA AA B 4 5 5
>> RA AA RR B 0 5 5
>> RR AA RR B 4 5 5
>> AA AA AA B 4 5 5
>> AA AA RA C 3 3 0
>> AA AA RA C 3 3 0
>> AA RR RA  C 3 3 0
>>
>> That is, I would like to have the specific counts next to what I'm
>> consecutively counting per group.  So for Group A for dad there are 2
>> AAs,  there are two RRs for mum but only 1 AA or RR for the child and
>> that is RR (so the 1 is next to the RR and not the RA).
>>
>> Can this be done?
>>
>> K.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Jeff Newmiller
Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.

Your second and third lines are  syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]> wrote:

>Is it possible to add the following code or similar in data.table:
>
>childseg<-0
>x:=sumchild <-0
>span<-rle(x)$lengths[rle(x)$values==TRUE
>childseg[x]<-rep(seq_along(span), times = span)
>childseg[childseg == 0]<-''
>
>I was hoping to do this code by Group for mum, dad and
>child.  The problem I'm having is with the
>span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>be added to data.table.
>
>[Previous email had incorrect code]
>
>On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
><[hidden email]> wrote:
>> I do not understand the value of using the rle function in your
>description,
>> but the code below appears to produce the table you want.
>>
>> Note that better support for the data.table package might be found at
>> stackexchange as the documentation specifies.
>>
>> x <- read.table( text=
>> "Dad Mum Child Group
>> AA RR RA A
>> AA RR RR A
>> AA AA AA B
>> AA AA AA B
>> RA AA RR B
>> RR AA RR B
>> AA AA AA B
>> AA AA RA C
>> AA AA RA C
>> AA RR RA C
>> ", header=TRUE, stringsAsFactors=FALSE )
>>
>> library(data.table)
>> DT <- data.table( x )
>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>> DT[ , sumdad := 0L ]
>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>> DT[ , cdad := NULL ]
>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>> DT[ , summum := 0L ]
>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>> DT[ , cmum := NULL ]
>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>> DT[ , sumchild := 0L ]
>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>> DT[ , cchild := NULL ]
>>
>>> DT
>>
>>     Dad Mum Child Group sumdad summum sumchild
>>  1:  AA  RR    RA     A      2      2        0
>>  2:  AA  RR    RR     A      2      2        1
>>  3:  AA  AA    AA     B      4      5        5
>>  4:  AA  AA    AA     B      4      5        5
>>  5:  RA  AA    RR     B      0      5        5
>>  6:  RR  AA    RR     B      4      5        5
>>  7:  AA  AA    AA     B      4      5        5
>>  8:  AA  AA    RA     C      3      3        0
>>  9:  AA  AA    RA     C      3      3        0
>> 10:  AA  RR    RA     C      3      3        0
>>
>>
>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>
>>> I'm trying to use both these packages and wondering whether they are
>>> possible...
>>>
>>> To make this simple, my ultimate goal is determine long stretches of
>>> 1s, but I want to do this within groups (hence using the data.table
>as
>>> I use the "set key" option.  However, I'm I'm not having much luck
>>> making this possible.
>>>
>>> For example, for simplistic sake, I have the following data:
>>>
>>> Dad Mum Child Group
>>> AA RR RA A
>>> AA RR RR A
>>> AA AA AA B
>>> AA AA AA B
>>> RA AA RR B
>>> RR AA RR B
>>> AA AA AA B
>>> AA AA RA C
>>> AA AA RA C
>>> AA RR RA  C
>>>
>>> And the following code which I know works
>>>
>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>
>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>
>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>
>>> However, I wish to do the above code by Group (though this file is
>>> millions of rows long and groups will be larger but just wanted to
>>> simply the example).
>>>
>>> I did something like this but of course I got an error:
>>>
>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>
>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>
>>> The reason being as I want to eventually have something like this:
>>>
>>> Dad Mum Child Group sumdad summum sumchild
>>> AA RR RA A 2 2 0
>>> AA RR RR A 2 2 1
>>> AA AA AA B 4 5 5
>>> AA AA AA B 4 5 5
>>> RA AA RR B 0 5 5
>>> RR AA RR B 4 5 5
>>> AA AA AA B 4 5 5
>>> AA AA RA C 3 3 0
>>> AA AA RA C 3 3 0
>>> AA RR RA  C 3 3 0
>>>
>>> That is, I would like to have the specific counts next to what I'm
>>> consecutively counting per group.  So for Group A for dad there are
>2
>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>and
>>> that is RR (so the 1 is next to the RR and not the RA).
>>>
>>> Can this be done?
>>>
>>> K.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go
>Live...
>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>Go...
>>                                       Live:   OO#.. Dead: OO#..
>Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.
>rocks...1k
>>
>---------------------------------------------------------------------------

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Beejai
Apologies - mix up of syntax all over the place, a habit of mine.  The
last line was in there because of code beforehand so it really doesn't
need to be there.  Here is the proper code I hope:

childseg<-0
x<-sumchild ==0
span<-rle(x)$lengths[rle(x)$values==TRUE]
childseg[x]<-rep(seq_along(span), times = span)


On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
<[hidden email]> wrote:

> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.
>
> Your second and third lines are  syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]> wrote:
>>Is it possible to add the following code or similar in data.table:
>>
>>childseg<-0
>>x:=sumchild <-0
>>span<-rle(x)$lengths[rle(x)$values==TRUE
>>childseg[x]<-rep(seq_along(span), times = span)
>>childseg[childseg == 0]<-''
>>
>>I was hoping to do this code by Group for mum, dad and
>>child.  The problem I'm having is with the
>>span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>be added to data.table.
>>
>>[Previous email had incorrect code]
>>
>>On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>><[hidden email]> wrote:
>>> I do not understand the value of using the rle function in your
>>description,
>>> but the code below appears to produce the table you want.
>>>
>>> Note that better support for the data.table package might be found at
>>> stackexchange as the documentation specifies.
>>>
>>> x <- read.table( text=
>>> "Dad Mum Child Group
>>> AA RR RA A
>>> AA RR RR A
>>> AA AA AA B
>>> AA AA AA B
>>> RA AA RR B
>>> RR AA RR B
>>> AA AA AA B
>>> AA AA RA C
>>> AA AA RA C
>>> AA RR RA C
>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>
>>> library(data.table)
>>> DT <- data.table( x )
>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>> DT[ , sumdad := 0L ]
>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>> DT[ , cdad := NULL ]
>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>> DT[ , summum := 0L ]
>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>> DT[ , cmum := NULL ]
>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>> DT[ , sumchild := 0L ]
>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>> DT[ , cchild := NULL ]
>>>
>>>> DT
>>>
>>>     Dad Mum Child Group sumdad summum sumchild
>>>  1:  AA  RR    RA     A      2      2        0
>>>  2:  AA  RR    RR     A      2      2        1
>>>  3:  AA  AA    AA     B      4      5        5
>>>  4:  AA  AA    AA     B      4      5        5
>>>  5:  RA  AA    RR     B      0      5        5
>>>  6:  RR  AA    RR     B      4      5        5
>>>  7:  AA  AA    AA     B      4      5        5
>>>  8:  AA  AA    RA     C      3      3        0
>>>  9:  AA  AA    RA     C      3      3        0
>>> 10:  AA  RR    RA     C      3      3        0
>>>
>>>
>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>
>>>> I'm trying to use both these packages and wondering whether they are
>>>> possible...
>>>>
>>>> To make this simple, my ultimate goal is determine long stretches of
>>>> 1s, but I want to do this within groups (hence using the data.table
>>as
>>>> I use the "set key" option.  However, I'm I'm not having much luck
>>>> making this possible.
>>>>
>>>> For example, for simplistic sake, I have the following data:
>>>>
>>>> Dad Mum Child Group
>>>> AA RR RA A
>>>> AA RR RR A
>>>> AA AA AA B
>>>> AA AA AA B
>>>> RA AA RR B
>>>> RR AA RR B
>>>> AA AA AA B
>>>> AA AA RA C
>>>> AA AA RA C
>>>> AA RR RA  C
>>>>
>>>> And the following code which I know works
>>>>
>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>
>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>
>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>
>>>> However, I wish to do the above code by Group (though this file is
>>>> millions of rows long and groups will be larger but just wanted to
>>>> simply the example).
>>>>
>>>> I did something like this but of course I got an error:
>>>>
>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>
>>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>
>>>> The reason being as I want to eventually have something like this:
>>>>
>>>> Dad Mum Child Group sumdad summum sumchild
>>>> AA RR RA A 2 2 0
>>>> AA RR RR A 2 2 1
>>>> AA AA AA B 4 5 5
>>>> AA AA AA B 4 5 5
>>>> RA AA RR B 0 5 5
>>>> RR AA RR B 4 5 5
>>>> AA AA AA B 4 5 5
>>>> AA AA RA C 3 3 0
>>>> AA AA RA C 3 3 0
>>>> AA RR RA  C 3 3 0
>>>>
>>>> That is, I would like to have the specific counts next to what I'm
>>>> consecutively counting per group.  So for Group A for dad there are
>>2
>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>>and
>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>
>>>> Can this be done?
>>>>
>>>> K.
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>>Live...
>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>Go...
>>>                                       Live:   OO#.. Dead: OO#..
>>Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>rocks...1k
>>>
>>---------------------------------------------------------------------------
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

David Winsemius

> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>
> Apologies - mix up of syntax all over the place, a habit of mine.  The
> last line was in there because of code beforehand so it really doesn't
> need to be there.  Here is the proper code I hope:
>
> childseg<-0
> x<-sumchild ==0
> span<-rle(x)$lengths[rle(x)$values==TRUE]
> childseg[x]<-rep(seq_along(span), times = span)
>

This remains not reproducible. We have no idea what sumchild might be and the code throws an error. My guess is that you are trying to get a result such as would be delivered by:

childseg <- sumchild[ sumchild != 0 ]


David.

>
> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
> <[hidden email]> wrote:
>> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.
>>
>> Your second and third lines are  syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
>> ---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go Live...
>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>                                      Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>> ---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>>
>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]> wrote:
>>> Is it possible to add the following code or similar in data.table:
>>>
>>> childseg<-0
>>> x:=sumchild <-0
>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>> childseg[x]<-rep(seq_along(span), times = span)
>>> childseg[childseg == 0]<-''
>>>
>>> I was hoping to do this code by Group for mum, dad and
>>> child.  The problem I'm having is with the
>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>> be added to data.table.
>>>
>>> [Previous email had incorrect code]
>>>
>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>> <[hidden email]> wrote:
>>>> I do not understand the value of using the rle function in your
>>> description,
>>>> but the code below appears to produce the table you want.
>>>>
>>>> Note that better support for the data.table package might be found at
>>>> stackexchange as the documentation specifies.
>>>>
>>>> x <- read.table( text=
>>>> "Dad Mum Child Group
>>>> AA RR RA A
>>>> AA RR RR A
>>>> AA AA AA B
>>>> AA AA AA B
>>>> RA AA RR B
>>>> RR AA RR B
>>>> AA AA AA B
>>>> AA AA RA C
>>>> AA AA RA C
>>>> AA RR RA C
>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>
>>>> library(data.table)
>>>> DT <- data.table( x )
>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>> DT[ , sumdad := 0L ]
>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>> DT[ , cdad := NULL ]
>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>> DT[ , summum := 0L ]
>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>> DT[ , cmum := NULL ]
>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>> DT[ , sumchild := 0L ]
>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>> DT[ , cchild := NULL ]
>>>>
>>>>> DT
>>>>
>>>>    Dad Mum Child Group sumdad summum sumchild
>>>> 1:  AA  RR    RA     A      2      2        0
>>>> 2:  AA  RR    RR     A      2      2        1
>>>> 3:  AA  AA    AA     B      4      5        5
>>>> 4:  AA  AA    AA     B      4      5        5
>>>> 5:  RA  AA    RR     B      0      5        5
>>>> 6:  RR  AA    RR     B      4      5        5
>>>> 7:  AA  AA    AA     B      4      5        5
>>>> 8:  AA  AA    RA     C      3      3        0
>>>> 9:  AA  AA    RA     C      3      3        0
>>>> 10:  AA  RR    RA     C      3      3        0
>>>>
>>>>
>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>
>>>>> I'm trying to use both these packages and wondering whether they are
>>>>> possible...
>>>>>
>>>>> To make this simple, my ultimate goal is determine long stretches of
>>>>> 1s, but I want to do this within groups (hence using the data.table
>>> as
>>>>> I use the "set key" option.  However, I'm I'm not having much luck
>>>>> making this possible.
>>>>>
>>>>> For example, for simplistic sake, I have the following data:
>>>>>
>>>>> Dad Mum Child Group
>>>>> AA RR RA A
>>>>> AA RR RR A
>>>>> AA AA AA B
>>>>> AA AA AA B
>>>>> RA AA RR B
>>>>> RR AA RR B
>>>>> AA AA AA B
>>>>> AA AA RA C
>>>>> AA AA RA C
>>>>> AA RR RA  C
>>>>>
>>>>> And the following code which I know works
>>>>>
>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>
>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>
>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>
>>>>> However, I wish to do the above code by Group (though this file is
>>>>> millions of rows long and groups will be larger but just wanted to
>>>>> simply the example).
>>>>>
>>>>> I did something like this but of course I got an error:
>>>>>
>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>
>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>
>>>>> The reason being as I want to eventually have something like this:
>>>>>
>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>> AA RR RA A 2 2 0
>>>>> AA RR RR A 2 2 1
>>>>> AA AA AA B 4 5 5
>>>>> AA AA AA B 4 5 5
>>>>> RA AA RR B 0 5 5
>>>>> RR AA RR B 4 5 5
>>>>> AA AA AA B 4 5 5
>>>>> AA AA RA C 3 3 0
>>>>> AA AA RA C 3 3 0
>>>>> AA RR RA  C 3 3 0
>>>>>
>>>>> That is, I would like to have the specific counts next to what I'm
>>>>> consecutively counting per group.  So for Group A for dad there are
>>> 2
>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>>> and
>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>
>>>>> Can this be done?
>>>>>
>>>>> K.
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....  Go
>>> Live...
>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>> Go...
>>>>                                      Live:   OO#.. Dead: OO#..
>>> Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>> rocks...1k
>>>>
>>> ---------------------------------------------------------------------------
>>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Beejai
Ah, crap.  Yep you're right.  This is not going too well. Okay - let
me try that again:

x$childseg<-0
x<-x$sumchild !=0
span<-rle(x)$lengths[rle(x)$values==TRUE]
x$childseg[x]<-rep(seq_along(span), times = span)

Does this one have any errors?


On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]> wrote:

>
>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>
>> Apologies - mix up of syntax all over the place, a habit of mine.  The
>> last line was in there because of code beforehand so it really doesn't
>> need to be there.  Here is the proper code I hope:
>>
>> childseg<-0
>> x<-sumchild ==0
>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>> childseg[x]<-rep(seq_along(span), times = span)
>>
>
> This remains not reproducible. We have no idea what sumchild might be and the code throws an error. My guess is that you are trying to get a result such as would be delivered by:
>
> childseg <- sumchild[ sumchild != 0 ]
>
> —
> David.
>
>>
>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>> <[hidden email]> wrote:
>>> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.
>>>
>>> Your second and third lines are  syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
>>> ---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go Live...
>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>                                      Live:   OO#.. Dead: OO#..  Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>>> ---------------------------------------------------------------------------
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]> wrote:
>>>> Is it possible to add the following code or similar in data.table:
>>>>
>>>> childseg<-0
>>>> x:=sumchild <-0
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>> childseg[childseg == 0]<-''
>>>>
>>>> I was hoping to do this code by Group for mum, dad and
>>>> child.  The problem I'm having is with the
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>>> be added to data.table.
>>>>
>>>> [Previous email had incorrect code]
>>>>
>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>> <[hidden email]> wrote:
>>>>> I do not understand the value of using the rle function in your
>>>> description,
>>>>> but the code below appears to produce the table you want.
>>>>>
>>>>> Note that better support for the data.table package might be found at
>>>>> stackexchange as the documentation specifies.
>>>>>
>>>>> x <- read.table( text=
>>>>> "Dad Mum Child Group
>>>>> AA RR RA A
>>>>> AA RR RR A
>>>>> AA AA AA B
>>>>> AA AA AA B
>>>>> RA AA RR B
>>>>> RR AA RR B
>>>>> AA AA AA B
>>>>> AA AA RA C
>>>>> AA AA RA C
>>>>> AA RR RA C
>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>
>>>>> library(data.table)
>>>>> DT <- data.table( x )
>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>> DT[ , sumdad := 0L ]
>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>> DT[ , cdad := NULL ]
>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>> DT[ , summum := 0L ]
>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>> DT[ , cmum := NULL ]
>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>> DT[ , sumchild := 0L ]
>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>> DT[ , cchild := NULL ]
>>>>>
>>>>>> DT
>>>>>
>>>>>    Dad Mum Child Group sumdad summum sumchild
>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>
>>>>>
>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>
>>>>>> I'm trying to use both these packages and wondering whether they are
>>>>>> possible...
>>>>>>
>>>>>> To make this simple, my ultimate goal is determine long stretches of
>>>>>> 1s, but I want to do this within groups (hence using the data.table
>>>> as
>>>>>> I use the "set key" option.  However, I'm I'm not having much luck
>>>>>> making this possible.
>>>>>>
>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>
>>>>>> Dad Mum Child Group
>>>>>> AA RR RA A
>>>>>> AA RR RR A
>>>>>> AA AA AA B
>>>>>> AA AA AA B
>>>>>> RA AA RR B
>>>>>> RR AA RR B
>>>>>> AA AA AA B
>>>>>> AA AA RA C
>>>>>> AA AA RA C
>>>>>> AA RR RA  C
>>>>>>
>>>>>> And the following code which I know works
>>>>>>
>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>
>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>
>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>
>>>>>> However, I wish to do the above code by Group (though this file is
>>>>>> millions of rows long and groups will be larger but just wanted to
>>>>>> simply the example).
>>>>>>
>>>>>> I did something like this but of course I got an error:
>>>>>>
>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>
>>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>
>>>>>> The reason being as I want to eventually have something like this:
>>>>>>
>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>> AA RR RA A 2 2 0
>>>>>> AA RR RR A 2 2 1
>>>>>> AA AA AA B 4 5 5
>>>>>> AA AA AA B 4 5 5
>>>>>> RA AA RR B 0 5 5
>>>>>> RR AA RR B 4 5 5
>>>>>> AA AA AA B 4 5 5
>>>>>> AA AA RA C 3 3 0
>>>>>> AA AA RA C 3 3 0
>>>>>> AA RR RA  C 3 3 0
>>>>>>
>>>>>> That is, I would like to have the specific counts next to what I'm
>>>>>> consecutively counting per group.  So for Group A for dad there are
>>>> 2
>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>>>> and
>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>
>>>>>> Can this be done?
>>>>>>
>>>>>> K.
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------------
>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>> Live...
>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>> Go...
>>>>>                                      Live:   OO#.. Dead: OO#..
>>>> Playing
>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>> rocks...1k
>>>>>
>>>> ---------------------------------------------------------------------------
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

David Winsemius

On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:

> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
> me try that again:
>
> x$childseg<-0
> x<-x$sumchild !=0

That previous line would appear to overwrite the entire dataframe with the value of one vector

> span<-rle(x)$lengths[rle(x)$values==TRUE]
> x$childseg[x]<-rep(seq_along(span), times = span)
>
> Does this one have any errors?
Even assuming that the code from Jeff Newmiller is creating those objects I get

> x$childseg[x]<-rep(seq_along(span), times = span)
Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors

In the last line you are indexing a vector with a dataframe (or perhaps a data.table).

If we use Newmiller's object and then change some of the instances of "x" in your code to DT we get:

> DT$childseg<-0
> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
> span<-rle(x)$lengths[rle(x)$values==TRUE]
> DT$childseg[x]<-rep(seq_along(span), times = span)
> DT
    Dad Mum Child Group sumdad summum sumchild childseg
 1:  AA  RR    RA     A      2      2        0        0
 2:  AA  RR    RR     A      2      2        1        1
 3:  AA  AA    AA     B      4      5        5        1
 4:  AA  AA    AA     B      4      5        5        1
 5:  RA  AA    RR     B      0      5        5        1
 6:  RR  AA    RR     B      4      5        5        1
 7:  AA  AA    AA     B      4      5        5        1
 8:  AA  AA    RA     C      3      3        0        0
 9:  AA  AA    RA     C      3      3        0        0
10:  AA  RR    RA     C      3      3        0        0

You persist in posting code where you do not explain what you are trying to do with it. You have already been told that your earlier efforts using `rle` did not make any sense. Post a complete example and then explain what you desire as an object. It's often helpful to provide a scientific background for what the data represents.

--
David.

>
>
> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]> wrote:
>>
>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>
>>> Apologies - mix up of syntax all over the place, a habit of mine.  The
>>> last line was in there because of code beforehand so it really doesn't
>>> need to be there.  Here is the proper code I hope:
>>>
>>> childseg<-0
>>> x<-sumchild ==0
>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>> childseg[x]<-rep(seq_along(span), times = span)
>>>
>>
>> This remains not reproducible. We have no idea what sumchild might be and the code throws an error. My guess is that you are trying to get a result such as would be delivered by:
>>
>> childseg <- sumchild[ sumchild != 0 ]
>>
>> —
>> David.
>>
>>>
>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>> <[hidden email]> wrote:
>>>> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results.
>>>>
>>>> Your second and third lines are  syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line.
>>>> ---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....  Go Live...
>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>>                                     Live:   OO#.. Dead: OO#..  Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>>>> ---------------------------------------------------------------------------
>>>> Sent from my phone. Please excuse my brevity.
>>>>
>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]> wrote:
>>>>> Is it possible to add the following code or similar in data.table:
>>>>>
>>>>> childseg<-0
>>>>> x:=sumchild <-0
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>> childseg[childseg == 0]<-''
>>>>>
>>>>> I was hoping to do this code by Group for mum, dad and
>>>>> child.  The problem I'm having is with the
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>>>> be added to data.table.
>>>>>
>>>>> [Previous email had incorrect code]
>>>>>
>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>> <[hidden email]> wrote:
>>>>>> I do not understand the value of using the rle function in your
>>>>> description,
>>>>>> but the code below appears to produce the table you want.
>>>>>>
>>>>>> Note that better support for the data.table package might be found at
>>>>>> stackexchange as the documentation specifies.
>>>>>>
>>>>>> x <- read.table( text=
>>>>>> "Dad Mum Child Group
>>>>>> AA RR RA A
>>>>>> AA RR RR A
>>>>>> AA AA AA B
>>>>>> AA AA AA B
>>>>>> RA AA RR B
>>>>>> RR AA RR B
>>>>>> AA AA AA B
>>>>>> AA AA RA C
>>>>>> AA AA RA C
>>>>>> AA RR RA C
>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>
>>>>>> library(data.table)
>>>>>> DT <- data.table( x )
>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>> DT[ , sumdad := 0L ]
>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>> DT[ , cdad := NULL ]
>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>> DT[ , summum := 0L ]
>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>> DT[ , cmum := NULL ]
>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>> DT[ , sumchild := 0L ]
>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>> DT[ , cchild := NULL ]
>>>>>>
>>>>>>> DT
>>>>>>
>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>>
>>>>>>
>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>
>>>>>>> I'm trying to use both these packages and wondering whether they are
>>>>>>> possible...
>>>>>>>
>>>>>>> To make this simple, my ultimate goal is determine long stretches of
>>>>>>> 1s, but I want to do this within groups (hence using the data.table
>>>>> as
>>>>>>> I use the "set key" option.  However, I'm I'm not having much luck
>>>>>>> making this possible.
>>>>>>>
>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>
>>>>>>> Dad Mum Child Group
>>>>>>> AA RR RA A
>>>>>>> AA RR RR A
>>>>>>> AA AA AA B
>>>>>>> AA AA AA B
>>>>>>> RA AA RR B
>>>>>>> RR AA RR B
>>>>>>> AA AA AA B
>>>>>>> AA AA RA C
>>>>>>> AA AA RA C
>>>>>>> AA RR RA  C
>>>>>>>
>>>>>>> And the following code which I know works
>>>>>>>
>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>
>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>
>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>
>>>>>>> However, I wish to do the above code by Group (though this file is
>>>>>>> millions of rows long and groups will be larger but just wanted to
>>>>>>> simply the example).
>>>>>>>
>>>>>>> I did something like this but of course I got an error:
>>>>>>>
>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>
>>>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>
>>>>>>> The reason being as I want to eventually have something like this:
>>>>>>>
>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>> AA RR RA A 2 2 0
>>>>>>> AA RR RR A 2 2 1
>>>>>>> AA AA AA B 4 5 5
>>>>>>> AA AA AA B 4 5 5
>>>>>>> RA AA RR B 0 5 5
>>>>>>> RR AA RR B 4 5 5
>>>>>>> AA AA AA B 4 5 5
>>>>>>> AA AA RA C 3 3 0
>>>>>>> AA AA RA C 3 3 0
>>>>>>> AA RR RA  C 3 3 0
>>>>>>>
>>>>>>> That is, I would like to have the specific counts next to what I'm
>>>>>>> consecutively counting per group.  So for Group A for dad there are
>>>>> 2
>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>>>>> and
>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>
>>>>>>> Can this be done?
>>>>>>>
>>>>>>> K.
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------------
>>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>>> Live...
>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>>> Go...
>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>> Playing
>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>> rocks...1k
>>>>>>
>>>>> ---------------------------------------------------------------------------
>>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Beejai
Obviously this is why I need help...

This is a larger data frame.  I'm only posting something small here to
make it simple.  There are many Groups which are larger, and I want to
assign a sequence value to consecutive rows where sumchild in not
equal to 0.  As the data frame I'm working with is much larger, this
goes up to 100 maybe even 200 and I have many different groups 20K+.
I would like to do this for every group, not for the whole data frame.

There is no particular science behind this, only data organizing.

So just say we had data like so:

    Dad Mum Child Group sumdad summum sumchild childseg
 1:  AA  RR    RA     A      2      2        0        0
 2:  AA  RR    RR     A      2      2        1        1
 3:  AA  AA    AA     B      4      5        5        1
 4:  AA  AA    RA     B      4      5        5        0
 5:  RA  AA    RR     B      0      5        5        2
 6:  RR  AA    RR     B      4      5        5        2
 7:  AA  AA    AA     B      4      5        5        2
 8:  AA  AA    AA     C      3      3        0        1
 9:  AA  AA    RA     C      3      3        0        0
10:  AA  RR    RR     C      3      3        0        2
 11:  AA  RR    RA     C     2      2        0        0
 12:  AA  RR    RR     C      2      2        1        3
 13:  AA  AA    AA     C      4      5        5        3
 14:  AA  AA    RA     C      4      5        5        0
 15:  RA  AA    RR     C      0      5        5        4

On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R]
<[hidden email]> wrote:

>
> On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
>
>> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
>> me try that again:
>>
>> x$childseg<-0
>> x<-x$sumchild !=0
>
> That previous line would appear to overwrite the entire dataframe with the
> value of one vector
>
>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>> x$childseg[x]<-rep(seq_along(span), times = span)
>>
>> Does this one have any errors?
> Even assuming that the code from Jeff Newmiller is creating those objects I
> get
>
>> x$childseg[x]<-rep(seq_along(span), times = span)
> Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors
>
> In the last line you are indexing a vector with a dataframe (or perhaps a
> data.table).
>
> If we use Newmiller's object and then change some of the instances of "x" in
> your code to DT we get:
>
>> DT$childseg<-0
>> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>> DT$childseg[x]<-rep(seq_along(span), times = span)
>> DT
>     Dad Mum Child Group sumdad summum sumchild childseg
>  1:  AA  RR    RA     A      2      2        0        0
>  2:  AA  RR    RR     A      2      2        1        1
>  3:  AA  AA    AA     B      4      5        5        1
>  4:  AA  AA    AA     B      4      5        5        1
>  5:  RA  AA    RR     B      0      5        5        1
>  6:  RR  AA    RR     B      4      5        5        1
>  7:  AA  AA    AA     B      4      5        5        1
>  8:  AA  AA    RA     C      3      3        0        0
>  9:  AA  AA    RA     C      3      3        0        0
> 10:  AA  RR    RA     C      3      3        0        0
>
> You persist in posting code where you do not explain what you are trying to
> do with it. You have already been told that your earlier efforts using `rle`
> did not make any sense. Post a complete example and then explain what you
> desire as an object. It's often helpful to provide a scientific background
> for what the data represents.
>
> --
> David.
>
>>
>>
>> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]> wrote:
>>>
>>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>>
>>>> Apologies - mix up of syntax all over the place, a habit of mine.  The
>>>> last line was in there because of code beforehand so it really doesn't
>>>> need to be there.  Here is the proper code I hope:
>>>>
>>>> childseg<-0
>>>> x<-sumchild ==0
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>
>>>
>>> This remains not reproducible. We have no idea what sumchild might be and
>>> the code throws an error. My guess is that you are trying to get a result
>>> such as would be delivered by:
>>>
>>> childseg <- sumchild[ sumchild != 0 ]
>>>
>>> —
>>> David.
>>>
>>>>
>>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>>> <[hidden email]> wrote:
>>>>> Thank you for attempting to encode what you want using R syntax, but
>>>>> you are not really succeeding yet (too many errors). Perhaps another hand
>>>>> generated result would help? A new input data frame might or might not be
>>>>> needed to illustrate desired results.
>>>>>
>>>>> Your second and third lines are  syntactically incorrect, and I don't
>>>>> understand what you hope to accomplish by assigning an empty string to a
>>>>> numeric in your last line.
>>>>>
>>>>> ---------------------------------------------------------------------------
>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>>> Live...
>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>>>                                     Live:   OO#.. Dead: OO#..  Playing
>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>> rocks...1k
>>>>>
>>>>> ---------------------------------------------------------------------------
>>>>> Sent from my phone. Please excuse my brevity.
>>>>>
>>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]>
>>>>> wrote:
>>>>>> Is it possible to add the following code or similar in data.table:
>>>>>>
>>>>>> childseg<-0
>>>>>> x:=sumchild <-0
>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>> childseg[childseg == 0]<-''
>>>>>>
>>>>>> I was hoping to do this code by Group for mum, dad and
>>>>>> child.  The problem I'm having is with the
>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can
>>>>>> be added to data.table.
>>>>>>
>>>>>> [Previous email had incorrect code]
>>>>>>
>>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>>> <[hidden email]> wrote:
>>>>>>> I do not understand the value of using the rle function in your
>>>>>> description,
>>>>>>> but the code below appears to produce the table you want.
>>>>>>>
>>>>>>> Note that better support for the data.table package might be found at
>>>>>>> stackexchange as the documentation specifies.
>>>>>>>
>>>>>>> x <- read.table( text=
>>>>>>> "Dad Mum Child Group
>>>>>>> AA RR RA A
>>>>>>> AA RR RR A
>>>>>>> AA AA AA B
>>>>>>> AA AA AA B
>>>>>>> RA AA RR B
>>>>>>> RR AA RR B
>>>>>>> AA AA AA B
>>>>>>> AA AA RA C
>>>>>>> AA AA RA C
>>>>>>> AA RR RA C
>>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>>
>>>>>>> library(data.table)
>>>>>>> DT <- data.table( x )
>>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>>> DT[ , sumdad := 0L ]
>>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>>> DT[ , cdad := NULL ]
>>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>>> DT[ , summum := 0L ]
>>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>>> DT[ , cmum := NULL ]
>>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>>> DT[ , sumchild := 0L ]
>>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>>> DT[ , cchild := NULL ]
>>>>>>>
>>>>>>>> DT
>>>>>>>
>>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>>
>>>>>>>> I'm trying to use both these packages and wondering whether they are
>>>>>>>> possible...
>>>>>>>>
>>>>>>>> To make this simple, my ultimate goal is determine long stretches of
>>>>>>>> 1s, but I want to do this within groups (hence using the data.table
>>>>>> as
>>>>>>>> I use the "set key" option.  However, I'm I'm not having much luck
>>>>>>>> making this possible.
>>>>>>>>
>>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>>
>>>>>>>> Dad Mum Child Group
>>>>>>>> AA RR RA A
>>>>>>>> AA RR RR A
>>>>>>>> AA AA AA B
>>>>>>>> AA AA AA B
>>>>>>>> RA AA RR B
>>>>>>>> RR AA RR B
>>>>>>>> AA AA AA B
>>>>>>>> AA AA RA C
>>>>>>>> AA AA RA C
>>>>>>>> AA RR RA  C
>>>>>>>>
>>>>>>>> And the following code which I know works
>>>>>>>>
>>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>>
>>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>>
>>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>>
>>>>>>>> However, I wish to do the above code by Group (though this file is
>>>>>>>> millions of rows long and groups will be larger but just wanted to
>>>>>>>> simply the example).
>>>>>>>>
>>>>>>>> I did something like this but of course I got an error:
>>>>>>>>
>>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>>
>>>>>>
>>>>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>>
>>>>>>>> The reason being as I want to eventually have something like this:
>>>>>>>>
>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>> AA RR RA A 2 2 0
>>>>>>>> AA RR RR A 2 2 1
>>>>>>>> AA AA AA B 4 5 5
>>>>>>>> AA AA AA B 4 5 5
>>>>>>>> RA AA RR B 0 5 5
>>>>>>>> RR AA RR B 4 5 5
>>>>>>>> AA AA AA B 4 5 5
>>>>>>>> AA AA RA C 3 3 0
>>>>>>>> AA AA RA C 3 3 0
>>>>>>>> AA RR RA  C 3 3 0
>>>>>>>>
>>>>>>>> That is, I would like to have the specific counts next to what I'm
>>>>>>>> consecutively counting per group.  So for Group A for dad there are
>>>>>> 2
>>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the child
>>>>>> and
>>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>>
>>>>>>>> Can this be done?
>>>>>>>>
>>>>>>>> K.
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------------
>>>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>>>> Live...
>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>>>> Go...
>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>>> Playing
>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>> rocks...1k
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------------
>>>>>
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html
> To unsubscribe from rle with data.table - is it possible?, click here.
> NAML
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Jeff Newmiller
The problem is that I cannot see how your use of rle and/or seq_along could possibly lead to the sample result you are giving us. That is why I asked for a new example.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

On January 2, 2015 5:11:09 PM PST, Beejai <[hidden email]> wrote:

>Obviously this is why I need help...
>
>This is a larger data frame.  I'm only posting something small here to
>make it simple.  There are many Groups which are larger, and I want to
>assign a sequence value to consecutive rows where sumchild in not
>equal to 0.  As the data frame I'm working with is much larger, this
>goes up to 100 maybe even 200 and I have many different groups 20K+.
>I would like to do this for every group, not for the whole data frame.
>
>There is no particular science behind this, only data organizing.
>
>So just say we had data like so:
>
>    Dad Mum Child Group sumdad summum sumchild childseg
> 1:  AA  RR    RA     A      2      2        0        0
> 2:  AA  RR    RR     A      2      2        1        1
> 3:  AA  AA    AA     B      4      5        5        1
> 4:  AA  AA    RA     B      4      5        5        0
> 5:  RA  AA    RR     B      0      5        5        2
> 6:  RR  AA    RR     B      4      5        5        2
> 7:  AA  AA    AA     B      4      5        5        2
> 8:  AA  AA    AA     C      3      3        0        1
> 9:  AA  AA    RA     C      3      3        0        0
>10:  AA  RR    RR     C      3      3        0        2
> 11:  AA  RR    RA     C     2      2        0        0
> 12:  AA  RR    RR     C      2      2        1        3
> 13:  AA  AA    AA     C      4      5        5        3
> 14:  AA  AA    RA     C      4      5        5        0
> 15:  RA  AA    RR     C      0      5        5        4
>
>On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R]
><[hidden email]> wrote:
>>
>> On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
>>
>>> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
>>> me try that again:
>>>
>>> x$childseg<-0
>>> x<-x$sumchild !=0
>>
>> That previous line would appear to overwrite the entire dataframe
>with the
>> value of one vector
>>
>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>>
>>> Does this one have any errors?
>> Even assuming that the code from Jeff Newmiller is creating those
>objects I
>> get
>>
>>> x$childseg[x]<-rep(seq_along(span), times = span)
>> Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors
>>
>> In the last line you are indexing a vector with a dataframe (or
>perhaps a
>> data.table).
>>
>> If we use Newmiller's object and then change some of the instances of
>"x" in
>> your code to DT we get:
>>
>>> DT$childseg<-0
>>> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>> DT$childseg[x]<-rep(seq_along(span), times = span)
>>> DT
>>     Dad Mum Child Group sumdad summum sumchild childseg
>>  1:  AA  RR    RA     A      2      2        0        0
>>  2:  AA  RR    RR     A      2      2        1        1
>>  3:  AA  AA    AA     B      4      5        5        1
>>  4:  AA  AA    AA     B      4      5        5        1
>>  5:  RA  AA    RR     B      0      5        5        1
>>  6:  RR  AA    RR     B      4      5        5        1
>>  7:  AA  AA    AA     B      4      5        5        1
>>  8:  AA  AA    RA     C      3      3        0        0
>>  9:  AA  AA    RA     C      3      3        0        0
>> 10:  AA  RR    RA     C      3      3        0        0
>>
>> You persist in posting code where you do not explain what you are
>trying to
>> do with it. You have already been told that your earlier efforts
>using `rle`
>> did not make any sense. Post a complete example and then explain what
>you
>> desire as an object. It's often helpful to provide a scientific
>background
>> for what the data represents.
>>
>> --
>> David.
>>
>>>
>>>
>>> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]>
>wrote:
>>>>
>>>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>>>
>>>>> Apologies - mix up of syntax all over the place, a habit of mine.
>The
>>>>> last line was in there because of code beforehand so it really
>doesn't
>>>>> need to be there.  Here is the proper code I hope:
>>>>>
>>>>> childseg<-0
>>>>> x<-sumchild ==0
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>
>>>>
>>>> This remains not reproducible. We have no idea what sumchild might
>be and
>>>> the code throws an error. My guess is that you are trying to get a
>result
>>>> such as would be delivered by:
>>>>
>>>> childseg <- sumchild[ sumchild != 0 ]
>>>>
>>>> —
>>>> David.
>>>>
>>>>>
>>>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>>>> <[hidden email]> wrote:
>>>>>> Thank you for attempting to encode what you want using R syntax,
>but
>>>>>> you are not really succeeding yet (too many errors). Perhaps
>another hand
>>>>>> generated result would help? A new input data frame might or
>might not be
>>>>>> needed to illustrate desired results.
>>>>>>
>>>>>> Your second and third lines are  syntactically incorrect, and I
>don't
>>>>>> understand what you hope to accomplish by assigning an empty
>string to a
>>>>>> numeric in your last line.
>>>>>>
>>>>>>
>---------------------------------------------------------------------------
>>>>>> Jeff Newmiller                        The     .....       .....
>Go
>>>>>> Live...
>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>>>>                                     Live:   OO#.. Dead: OO#..
>Playing
>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>with
>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>> rocks...1k
>>>>>>
>>>>>>
>---------------------------------------------------------------------------
>>>>>> Sent from my phone. Please excuse my brevity.
>>>>>>
>>>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]>
>>>>>> wrote:
>>>>>>> Is it possible to add the following code or similar in
>data.table:
>>>>>>>
>>>>>>> childseg<-0
>>>>>>> x:=sumchild <-0
>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>> childseg[childseg == 0]<-''
>>>>>>>
>>>>>>> I was hoping to do this code by Group for mum, dad and
>>>>>>> child.  The problem I'm having is with the
>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure
>can
>>>>>>> be added to data.table.
>>>>>>>
>>>>>>> [Previous email had incorrect code]
>>>>>>>
>>>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>>>> <[hidden email]> wrote:
>>>>>>>> I do not understand the value of using the rle function in your
>>>>>>> description,
>>>>>>>> but the code below appears to produce the table you want.
>>>>>>>>
>>>>>>>> Note that better support for the data.table package might be
>found at
>>>>>>>> stackexchange as the documentation specifies.
>>>>>>>>
>>>>>>>> x <- read.table( text=
>>>>>>>> "Dad Mum Child Group
>>>>>>>> AA RR RA A
>>>>>>>> AA RR RR A
>>>>>>>> AA AA AA B
>>>>>>>> AA AA AA B
>>>>>>>> RA AA RR B
>>>>>>>> RR AA RR B
>>>>>>>> AA AA AA B
>>>>>>>> AA AA RA C
>>>>>>>> AA AA RA C
>>>>>>>> AA RR RA C
>>>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>>>
>>>>>>>> library(data.table)
>>>>>>>> DT <- data.table( x )
>>>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , sumdad := 0L ]
>>>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>>>> DT[ , cdad := NULL ]
>>>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , summum := 0L ]
>>>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>>>> DT[ , cmum := NULL ]
>>>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>>>> DT[ , sumchild := 0L ]
>>>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>>>> DT[ , cchild := NULL ]
>>>>>>>>
>>>>>>>>> DT
>>>>>>>>
>>>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>>>
>>>>>>>>> I'm trying to use both these packages and wondering whether
>they are
>>>>>>>>> possible...
>>>>>>>>>
>>>>>>>>> To make this simple, my ultimate goal is determine long
>stretches of
>>>>>>>>> 1s, but I want to do this within groups (hence using the
>data.table
>>>>>>> as
>>>>>>>>> I use the "set key" option.  However, I'm I'm not having much
>luck
>>>>>>>>> making this possible.
>>>>>>>>>
>>>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>>>
>>>>>>>>> Dad Mum Child Group
>>>>>>>>> AA RR RA A
>>>>>>>>> AA RR RR A
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA AA B
>>>>>>>>> RA AA RR B
>>>>>>>>> RR AA RR B
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA RA C
>>>>>>>>> AA AA RA C
>>>>>>>>> AA RR RA  C
>>>>>>>>>
>>>>>>>>> And the following code which I know works
>>>>>>>>>
>>>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>>>
>>>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>>>
>>>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>>>
>>>>>>>>> However, I wish to do the above code by Group (though this
>file is
>>>>>>>>> millions of rows long and groups will be larger but just
>wanted to
>>>>>>>>> simply the example).
>>>>>>>>>
>>>>>>>>> I did something like this but of course I got an error:
>>>>>>>>>
>>>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>>>>
>LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>>>>
>LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>>>
>>>>>>>
>>>>>>>
>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>>>
>>>>>>>>> The reason being as I want to eventually have something like
>this:
>>>>>>>>>
>>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>>> AA RR RA A 2 2 0
>>>>>>>>> AA RR RR A 2 2 1
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> RA AA RR B 0 5 5
>>>>>>>>> RR AA RR B 4 5 5
>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>> AA RR RA  C 3 3 0
>>>>>>>>>
>>>>>>>>> That is, I would like to have the specific counts next to what
>I'm
>>>>>>>>> consecutively counting per group.  So for Group A for dad
>there are
>>>>>>> 2
>>>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the
>child
>>>>>>> and
>>>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>>>
>>>>>>>>> Can this be done?
>>>>>>>>>
>>>>>>>>> K.
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>code.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>---------------------------------------------------------------------------
>>>>>>>> Jeff Newmiller                        The     .....       .....
> Go
>>>>>>> Live...
>>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>>>>> Go...
>>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>>>> Playing
>>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
> with
>>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>> rocks...1k
>>>>>>>>
>>>>>>>
>>>>>>>
>---------------------------------------------------------------------------
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the
>discussion
>> below:
>>
>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html
>> To unsubscribe from rle with data.table - is it possible?, click
>here.
>> NAML
>
>
>
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701332.html
>Sent from the R help mailing list archive at Nabble.com.
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Beejai
What are you having trouble with exactly?  Do you need a bigger
example.  The code works perfectly well with your code so I'm sure how
you are finding trouble with it (minus the fact that I had put in a
few errors in myself at the beginning with I apologize).

On Fri, Jan 2, 2015 at 9:05 PM, Jeff Newmiller [via R]
<[hidden email]> wrote:

> The problem is that I cannot see how your use of rle and/or seq_along could
> possibly lead to the sample result you are giving us. That is why I asked
> for a new example.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On January 2, 2015 5:11:09 PM PST, Beejai <[hidden email]> wrote:
>
>>Obviously this is why I need help...
>>
>>This is a larger data frame.  I'm only posting something small here to
>>make it simple.  There are many Groups which are larger, and I want to
>>assign a sequence value to consecutive rows where sumchild in not
>>equal to 0.  As the data frame I'm working with is much larger, this
>>goes up to 100 maybe even 200 and I have many different groups 20K+.
>>I would like to do this for every group, not for the whole data frame.
>>
>>There is no particular science behind this, only data organizing.
>>
>>So just say we had data like so:
>>
>>    Dad Mum Child Group sumdad summum sumchild childseg
>> 1:  AA  RR    RA     A      2      2        0        0
>> 2:  AA  RR    RR     A      2      2        1        1
>> 3:  AA  AA    AA     B      4      5        5        1
>> 4:  AA  AA    RA     B      4      5        5        0
>> 5:  RA  AA    RR     B      0      5        5        2
>> 6:  RR  AA    RR     B      4      5        5        2
>> 7:  AA  AA    AA     B      4      5        5        2
>> 8:  AA  AA    AA     C      3      3        0        1
>> 9:  AA  AA    RA     C      3      3        0        0
>>10:  AA  RR    RR     C      3      3        0        2
>> 11:  AA  RR    RA     C     2      2        0        0
>> 12:  AA  RR    RR     C      2      2        1        3
>> 13:  AA  AA    AA     C      4      5        5        3
>> 14:  AA  AA    RA     C      4      5        5        0
>> 15:  RA  AA    RR     C      0      5        5        4
>>
>>On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R]
>><[hidden email]> wrote:
>>>
>>> On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
>>>
>>>> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
>>>> me try that again:
>>>>
>>>> x$childseg<-0
>>>> x<-x$sumchild !=0
>>>
>>> That previous line would appear to overwrite the entire dataframe
>>with the
>>> value of one vector
>>>
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>>>
>>>> Does this one have any errors?
>>> Even assuming that the code from Jeff Newmiller is creating those
>>objects I
>>> get
>>>
>>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>> Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors
>>>
>>> In the last line you are indexing a vector with a dataframe (or
>>perhaps a
>>> data.table).
>>>
>>> If we use Newmiller's object and then change some of the instances of
>>"x" in
>>> your code to DT we get:
>>>
>>>> DT$childseg<-0
>>>> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>> DT$childseg[x]<-rep(seq_along(span), times = span)
>>>> DT
>>>     Dad Mum Child Group sumdad summum sumchild childseg
>>>  1:  AA  RR    RA     A      2      2        0        0
>>>  2:  AA  RR    RR     A      2      2        1        1
>>>  3:  AA  AA    AA     B      4      5        5        1
>>>  4:  AA  AA    AA     B      4      5        5        1
>>>  5:  RA  AA    RR     B      0      5        5        1
>>>  6:  RR  AA    RR     B      4      5        5        1
>>>  7:  AA  AA    AA     B      4      5        5        1
>>>  8:  AA  AA    RA     C      3      3        0        0
>>>  9:  AA  AA    RA     C      3      3        0        0
>>> 10:  AA  RR    RA     C      3      3        0        0
>>>
>>> You persist in posting code where you do not explain what you are
>>trying to
>>> do with it. You have already been told that your earlier efforts
>>using `rle`
>>> did not make any sense. Post a complete example and then explain what
>>you
>>> desire as an object. It's often helpful to provide a scientific
>>background
>>> for what the data represents.
>>>
>>> --
>>> David.
>>>
>>>>
>>>>
>>>> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]>
>>wrote:
>>>>>
>>>>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>>>>
>>>>>> Apologies - mix up of syntax all over the place, a habit of mine.
>>The
>>>>>> last line was in there because of code beforehand so it really
>>doesn't
>>>>>> need to be there.  Here is the proper code I hope:
>>>>>>
>>>>>> childseg<-0
>>>>>> x<-sumchild ==0
>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>
>>>>>
>>>>> This remains not reproducible. We have no idea what sumchild might
>>be and
>>>>> the code throws an error. My guess is that you are trying to get a
>>result
>>>>> such as would be delivered by:
>>>>>
>>>>> childseg <- sumchild[ sumchild != 0 ]
>>>>>
>>>>> —
>>>>> David.
>>>>>
>>>>>>
>>>>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>>>>> <[hidden email]> wrote:
>>>>>>> Thank you for attempting to encode what you want using R syntax,
>>but
>>>>>>> you are not really succeeding yet (too many errors). Perhaps
>>another hand
>>>>>>> generated result would help? A new input data frame might or
>>might not be
>>>>>>> needed to illustrate desired results.
>>>>>>>
>>>>>>> Your second and third lines are  syntactically incorrect, and I
>>don't
>>>>>>> understand what you hope to accomplish by assigning an empty
>>string to a
>>>>>>> numeric in your last line.
>>>>>>>
>>>>>>>
>>---------------------------------------------------------------------------
>>>>>>> Jeff Newmiller                        The     .....       .....
>>Go
>>>>>>> Live...
>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>Playing
>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>>with
>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>> rocks...1k
>>>>>>>
>>>>>>>
>>---------------------------------------------------------------------------
>>>>>>> Sent from my phone. Please excuse my brevity.
>>>>>>>
>>>>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]>
>>>>>>> wrote:
>>>>>>>> Is it possible to add the following code or similar in
>>data.table:
>>>>>>>>
>>>>>>>> childseg<-0
>>>>>>>> x:=sumchild <-0
>>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>>> childseg[childseg == 0]<-''
>>>>>>>>
>>>>>>>> I was hoping to do this code by Group for mum, dad and
>>>>>>>> child.  The problem I'm having is with the
>>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure
>>can
>>>>>>>> be added to data.table.
>>>>>>>>
>>>>>>>> [Previous email had incorrect code]
>>>>>>>>
>>>>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>> I do not understand the value of using the rle function in your
>>>>>>>> description,
>>>>>>>>> but the code below appears to produce the table you want.
>>>>>>>>>
>>>>>>>>> Note that better support for the data.table package might be
>>found at
>>>>>>>>> stackexchange as the documentation specifies.
>>>>>>>>>
>>>>>>>>> x <- read.table( text=
>>>>>>>>> "Dad Mum Child Group
>>>>>>>>> AA RR RA A
>>>>>>>>> AA RR RR A
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA AA B
>>>>>>>>> RA AA RR B
>>>>>>>>> RR AA RR B
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA RA C
>>>>>>>>> AA AA RA C
>>>>>>>>> AA RR RA C
>>>>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>>>>
>>>>>>>>> library(data.table)
>>>>>>>>> DT <- data.table( x )
>>>>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , sumdad := 0L ]
>>>>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>>>>> DT[ , cdad := NULL ]
>>>>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , summum := 0L ]
>>>>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>>>>> DT[ , cmum := NULL ]
>>>>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , sumchild := 0L ]
>>>>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>>>>> DT[ , cchild := NULL ]
>>>>>>>>>
>>>>>>>>>> DT
>>>>>>>>>
>>>>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>>>>
>>>>>>>>>> I'm trying to use both these packages and wondering whether
>>they are
>>>>>>>>>> possible...
>>>>>>>>>>
>>>>>>>>>> To make this simple, my ultimate goal is determine long
>>stretches of
>>>>>>>>>> 1s, but I want to do this within groups (hence using the
>>data.table
>>>>>>>> as
>>>>>>>>>> I use the "set key" option.  However, I'm I'm not having much
>>luck
>>>>>>>>>> making this possible.
>>>>>>>>>>
>>>>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>>>>
>>>>>>>>>> Dad Mum Child Group
>>>>>>>>>> AA RR RA A
>>>>>>>>>> AA RR RR A
>>>>>>>>>> AA AA AA B
>>>>>>>>>> AA AA AA B
>>>>>>>>>> RA AA RR B
>>>>>>>>>> RR AA RR B
>>>>>>>>>> AA AA AA B
>>>>>>>>>> AA AA RA C
>>>>>>>>>> AA AA RA C
>>>>>>>>>> AA RR RA  C
>>>>>>>>>>
>>>>>>>>>> And the following code which I know works
>>>>>>>>>>
>>>>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>>>>
>>>>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>>>>
>>>>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>>>>
>>>>>>>>>> However, I wish to do the above code by Group (though this
>>file is
>>>>>>>>>> millions of rows long and groups will be larger but just
>>wanted to
>>>>>>>>>> simply the example).
>>>>>>>>>>
>>>>>>>>>> I did something like this but of course I got an error:
>>>>>>>>>>
>>>>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>>>>>
>>LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>>>>>
>>LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>>>>
>>>>>>>>>> The reason being as I want to eventually have something like
>>this:
>>>>>>>>>>
>>>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>>>> AA RR RA A 2 2 0
>>>>>>>>>> AA RR RR A 2 2 1
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> RA AA RR B 0 5 5
>>>>>>>>>> RR AA RR B 4 5 5
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>>> AA RR RA  C 3 3 0
>>>>>>>>>>
>>>>>>>>>> That is, I would like to have the specific counts next to what
>>I'm
>>>>>>>>>> consecutively counting per group.  So for Group A for dad
>>there are
>>>>>>>> 2
>>>>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the
>>child
>>>>>>>> and
>>>>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>>>>
>>>>>>>>>> Can this be done?
>>>>>>>>>>
>>>>>>>>>> K.
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>---------------------------------------------------------------------------
>>>>>>>>> Jeff Newmiller                        The     .....       .....
>> Go
>>>>>>>> Live...
>>>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>>>>>> Go...
>>>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>>>>> Playing
>>>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>> with
>>>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>>> rocks...1k
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>---------------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ________________________________
>>> If you reply to this email, your message will be added to the
>>discussion
>>> below:
>>>
>>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html
>>> To unsubscribe from rle with data.table - is it possible?, click
>>here.
>>> NAML
>>
>>
>>
>>
>>--
>>View this message in context:
>>http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701332.html
>>Sent from the R help mailing list archive at Nabble.com.
>> [[alternative HTML version deleted]]
>>
>>______________________________________________
>>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701333.html
> To unsubscribe from rle with data.table - is it possible?, click here.
> NAML
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Jeff Newmiller
In reply to this post by Jeff Newmiller
Here is what I get when I try to use your algorithm:

myf <- function( s ) {
   seg <- rep( 0, length( s ) )
   rs <- rle( s )
   span <- rs$lengths[ rs$values ]
   seg[ s ] <- rep( seq_along( span ), times = span )
   seg
}

DT <- data.table( x )
DT[ , dadseg := myf( Dad %in% c( "AA", "RR" ) ), by=Group ]
DT[ , mumseg := myf( Mum %in% c( "AA", "RR" ) ), by=Group ]
DT[ , childseg := myf( Child %in% c( "AA", "RR" ) ), by=Group ]
> DT
     Dad Mum Child Group dadseg mumseg childseg
  1:  AA  RR    RA     A      1      1        0
  2:  AA  RR    RR     A      1      1        1
  3:  AA  AA    AA     B      1      1        1
  4:  AA  AA    AA     B      1      1        1
  5:  RA  AA    RR     B      0      1        1
  6:  RR  AA    RR     B      2      1        1
  7:  AA  AA    AA     B      2      1        1
  8:  AA  AA    RA     C      1      1        0
  9:  AA  AA    RA     C      1      1        0
10:  AA  RR    RA     C      1      1        0


On Fri, 2 Jan 2015, Jeff Newmiller wrote:

> The problem is that I cannot see how your use of rle and/or seq_along
> could possibly lead to the sample result you are giving us. That is why
> I asked for a new example.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On January 2, 2015 5:11:09 PM PST, Beejai <[hidden email]> wrote:
>> Obviously this is why I need help...
>>
>> This is a larger data frame.  I'm only posting something small here to
>> make it simple.  There are many Groups which are larger, and I want to
>> assign a sequence value to consecutive rows where sumchild in not
>> equal to 0.  As the data frame I'm working with is much larger, this
>> goes up to 100 maybe even 200 and I have many different groups 20K+.
>> I would like to do this for every group, not for the whole data frame.
>>
>> There is no particular science behind this, only data organizing.
>>
>> So just say we had data like so:
>>
>>    Dad Mum Child Group sumdad summum sumchild childseg
>> 1:  AA  RR    RA     A      2      2        0        0
>> 2:  AA  RR    RR     A      2      2        1        1
>> 3:  AA  AA    AA     B      4      5        5        1
>> 4:  AA  AA    RA     B      4      5        5        0
>> 5:  RA  AA    RR     B      0      5        5        2
>> 6:  RR  AA    RR     B      4      5        5        2
>> 7:  AA  AA    AA     B      4      5        5        2
>> 8:  AA  AA    AA     C      3      3        0        1
>> 9:  AA  AA    RA     C      3      3        0        0
>> 10:  AA  RR    RR     C      3      3        0        2
>> 11:  AA  RR    RA     C     2      2        0        0
>> 12:  AA  RR    RR     C      2      2        1        3
>> 13:  AA  AA    AA     C      4      5        5        3
>> 14:  AA  AA    RA     C      4      5        5        0
>> 15:  RA  AA    RR     C      0      5        5        4
>>
>> On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R]
>> <[hidden email]> wrote:
>>>
>>> On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
>>>
>>>> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
>>>> me try that again:
>>>>
>>>> x$childseg<-0
>>>> x<-x$sumchild !=0
>>>
>>> That previous line would appear to overwrite the entire dataframe
>> with the
>>> value of one vector
>>>
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>>>
>>>> Does this one have any errors?
>>> Even assuming that the code from Jeff Newmiller is creating those
>> objects I
>>> get
>>>
>>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>> Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors
>>>
>>> In the last line you are indexing a vector with a dataframe (or
>> perhaps a
>>> data.table).
>>>
>>> If we use Newmiller's object and then change some of the instances of
>> "x" in
>>> your code to DT we get:
>>>
>>>> DT$childseg<-0
>>>> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>> DT$childseg[x]<-rep(seq_along(span), times = span)
>>>> DT
>>>     Dad Mum Child Group sumdad summum sumchild childseg
>>>  1:  AA  RR    RA     A      2      2        0        0
>>>  2:  AA  RR    RR     A      2      2        1        1
>>>  3:  AA  AA    AA     B      4      5        5        1
>>>  4:  AA  AA    AA     B      4      5        5        1
>>>  5:  RA  AA    RR     B      0      5        5        1
>>>  6:  RR  AA    RR     B      4      5        5        1
>>>  7:  AA  AA    AA     B      4      5        5        1
>>>  8:  AA  AA    RA     C      3      3        0        0
>>>  9:  AA  AA    RA     C      3      3        0        0
>>> 10:  AA  RR    RA     C      3      3        0        0
>>>
>>> You persist in posting code where you do not explain what you are
>> trying to
>>> do with it. You have already been told that your earlier efforts
>> using `rle`
>>> did not make any sense. Post a complete example and then explain what
>> you
>>> desire as an object. It's often helpful to provide a scientific
>> background
>>> for what the data represents.
>>>
>>> --
>>> David.
>>>
>>>>
>>>>
>>>> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]>
>> wrote:
>>>>>
>>>>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>>>>
>>>>>> Apologies - mix up of syntax all over the place, a habit of mine.
>> The
>>>>>> last line was in there because of code beforehand so it really
>> doesn't
>>>>>> need to be there.  Here is the proper code I hope:
>>>>>>
>>>>>> childseg<-0
>>>>>> x<-sumchild ==0
>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>
>>>>>
>>>>> This remains not reproducible. We have no idea what sumchild might
>> be and
>>>>> the code throws an error. My guess is that you are trying to get a
>> result
>>>>> such as would be delivered by:
>>>>>
>>>>> childseg <- sumchild[ sumchild != 0 ]
>>>>>
>>>>> ?
>>>>> David.
>>>>>
>>>>>>
>>>>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>>>>> <[hidden email]> wrote:
>>>>>>> Thank you for attempting to encode what you want using R syntax,
>> but
>>>>>>> you are not really succeeding yet (too many errors). Perhaps
>> another hand
>>>>>>> generated result would help? A new input data frame might or
>> might not be
>>>>>>> needed to illustrate desired results.
>>>>>>>
>>>>>>> Your second and third lines are  syntactically incorrect, and I
>> don't
>>>>>>> understand what you hope to accomplish by assigning an empty
>> string to a
>>>>>>> numeric in your last line.
>>>>>>>
>>>>>>>
>> ---------------------------------------------------------------------------
>>>>>>> Jeff Newmiller                        The     .....       .....
>> Go
>>>>>>> Live...
>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>>>>>                                     Live:   OO#.. Dead: OO#..
>> Playing
>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>> with
>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>> rocks...1k
>>>>>>>
>>>>>>>
>> ---------------------------------------------------------------------------
>>>>>>> Sent from my phone. Please excuse my brevity.
>>>>>>>
>>>>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]>
>>>>>>> wrote:
>>>>>>>> Is it possible to add the following code or similar in
>> data.table:
>>>>>>>>
>>>>>>>> childseg<-0
>>>>>>>> x:=sumchild <-0
>>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>>> childseg[childseg == 0]<-''
>>>>>>>>
>>>>>>>> I was hoping to do this code by Group for mum, dad and
>>>>>>>> child.  The problem I'm having is with the
>>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure
>> can
>>>>>>>> be added to data.table.
>>>>>>>>
>>>>>>>> [Previous email had incorrect code]
>>>>>>>>
>>>>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>> I do not understand the value of using the rle function in your
>>>>>>>> description,
>>>>>>>>> but the code below appears to produce the table you want.
>>>>>>>>>
>>>>>>>>> Note that better support for the data.table package might be
>> found at
>>>>>>>>> stackexchange as the documentation specifies.
>>>>>>>>>
>>>>>>>>> x <- read.table( text=
>>>>>>>>> "Dad Mum Child Group
>>>>>>>>> AA RR RA A
>>>>>>>>> AA RR RR A
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA AA B
>>>>>>>>> RA AA RR B
>>>>>>>>> RR AA RR B
>>>>>>>>> AA AA AA B
>>>>>>>>> AA AA RA C
>>>>>>>>> AA AA RA C
>>>>>>>>> AA RR RA C
>>>>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>>>>
>>>>>>>>> library(data.table)
>>>>>>>>> DT <- data.table( x )
>>>>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , sumdad := 0L ]
>>>>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>>>>> DT[ , cdad := NULL ]
>>>>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , summum := 0L ]
>>>>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>>>>> DT[ , cmum := NULL ]
>>>>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>>>>> DT[ , sumchild := 0L ]
>>>>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>>>>> DT[ , cchild := NULL ]
>>>>>>>>>
>>>>>>>>>> DT
>>>>>>>>>
>>>>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>>>>
>>>>>>>>>> I'm trying to use both these packages and wondering whether
>> they are
>>>>>>>>>> possible...
>>>>>>>>>>
>>>>>>>>>> To make this simple, my ultimate goal is determine long
>> stretches of
>>>>>>>>>> 1s, but I want to do this within groups (hence using the
>> data.table
>>>>>>>> as
>>>>>>>>>> I use the "set key" option.  However, I'm I'm not having much
>> luck
>>>>>>>>>> making this possible.
>>>>>>>>>>
>>>>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>>>>
>>>>>>>>>> Dad Mum Child Group
>>>>>>>>>> AA RR RA A
>>>>>>>>>> AA RR RR A
>>>>>>>>>> AA AA AA B
>>>>>>>>>> AA AA AA B
>>>>>>>>>> RA AA RR B
>>>>>>>>>> RR AA RR B
>>>>>>>>>> AA AA AA B
>>>>>>>>>> AA AA RA C
>>>>>>>>>> AA AA RA C
>>>>>>>>>> AA RR RA  C
>>>>>>>>>>
>>>>>>>>>> And the following code which I know works
>>>>>>>>>>
>>>>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>>>>
>>>>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>>>>
>>>>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>>>>
>>>>>>>>>> However, I wish to do the above code by Group (though this
>> file is
>>>>>>>>>> millions of rows long and groups will be larger but just
>> wanted to
>>>>>>>>>> simply the example).
>>>>>>>>>>
>>>>>>>>>> I did something like this but of course I got an error:
>>>>>>>>>>
>>>>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>>>>>
>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>>>>>
>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>>>>
>>>>>>>>>> The reason being as I want to eventually have something like
>> this:
>>>>>>>>>>
>>>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>>>> AA RR RA A 2 2 0
>>>>>>>>>> AA RR RR A 2 2 1
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> RA AA RR B 0 5 5
>>>>>>>>>> RR AA RR B 4 5 5
>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>>> AA RR RA  C 3 3 0
>>>>>>>>>>
>>>>>>>>>> That is, I would like to have the specific counts next to what
>> I'm
>>>>>>>>>> consecutively counting per group.  So for Group A for dad
>> there are
>>>>>>>> 2
>>>>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the
>> child
>>>>>>>> and
>>>>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>>>>
>>>>>>>>>> Can this be done?
>>>>>>>>>>
>>>>>>>>>> K.
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>> code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>> ---------------------------------------------------------------------------
>>>>>>>>> Jeff Newmiller                        The     .....       .....
>> Go
>>>>>>>> Live...
>>>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>>>>>> Go...
>>>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>>>>> Playing
>>>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>> with
>>>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>>> rocks...1k
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>> ---------------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> ________________________________
>>> If you reply to this email, your message will be added to the
>> discussion
>>> below:
>>>
>> http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html
>>> To unsubscribe from rle with data.table - is it possible?, click
>> here.
>>> NAML
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701332.html
>> Sent from the R help mailing list archive at Nabble.com.
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: rle with data.table - is it possible?

Beejai
Ahh ... you may have missed this:

Larger example below.

So for the below example it will be:

    Dad Mum Child Group sumdad summum sumchild childseg
 1:  AA  RR    RA     A      2      2        0        0
 2:  AA  RR    RR     A      2      2        1        1
 3:  AA  AA    AA     B      4      5        5        1
 4:  AA  AA    RA     B      4      5        5        0
 5:  RA  AA    RR     B      0      5        5        2
 6:  RR  AA    RR     B      4      5        5        2
 7:  AA  AA    AA     B      4      5        5        2
 8:  AA  AA    AA     C      3      3        0        1
 9:  AA  AA    RA     C      3      3        0        0
10:  AA  RR    RR     C      3      3        0        2
 11:  AA  RR    RA     C     2      2        0        0
 12:  AA  RR    RR     C      2      2        1        3
 13:  AA  AA    AA     C      4      5        5        3
 14:  AA  AA    RA     C      4      5        5        0
 15:  RA  AA    RR     C      0      5        5        4

Short explanation: I'm only posting something small here to
make it simple.  But I have of course many more Groups but essentially I want to
assign a sequence value to consecutive rows where sumchild is not
equal to 0.  As the data frame I'm working with is much larger, this
goes up to 100 maybe even 200 and I have many different groups 20K+.
I would like to do this for every group, not for the whole data frame.
So hence the initial question of asking whether rle and data.table can
be used together.

On Sat, Jan 3, 2015 at 2:45 AM, Jeff Newmiller <[hidden email]> wrote:

> Here is what I get when I try to use your algorithm:
>
> myf <- function( s ) {
>   seg <- rep( 0, length( s ) )
>   rs <- rle( s )
>   span <- rs$lengths[ rs$values ]
>   seg[ s ] <- rep( seq_along( span ), times = span )
>   seg
> }
>
> DT <- data.table( x )
> DT[ , dadseg := myf( Dad %in% c( "AA", "RR" ) ), by=Group ]
> DT[ , mumseg := myf( Mum %in% c( "AA", "RR" ) ), by=Group ]
> DT[ , childseg := myf( Child %in% c( "AA", "RR" ) ), by=Group ]
>>
>> DT
>
>     Dad Mum Child Group dadseg mumseg childseg
>  1:  AA  RR    RA     A      1      1        0
>  2:  AA  RR    RR     A      1      1        1
>  3:  AA  AA    AA     B      1      1        1
>  4:  AA  AA    AA     B      1      1        1
>  5:  RA  AA    RR     B      0      1        1
>  6:  RR  AA    RR     B      2      1        1
>  7:  AA  AA    AA     B      2      1        1
>  8:  AA  AA    RA     C      1      1        0
>  9:  AA  AA    RA     C      1      1        0
> 10:  AA  RR    RA     C      1      1        0
>
>
>
> On Fri, 2 Jan 2015, Jeff Newmiller wrote:
>
>> The problem is that I cannot see how your use of rle and/or seq_along
>> could possibly lead to the sample result you are giving us. That is why I
>> asked for a new example.
>>
>> ---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go
>> Live...
>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>> Go...
>>                                      Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.
>> rocks...1k
>>
>> ---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>>
>> On January 2, 2015 5:11:09 PM PST, Beejai <[hidden email]> wrote:
>>>
>>> Obviously this is why I need help...
>>>
>>> This is a larger data frame.  I'm only posting something small here to
>>> make it simple.  There are many Groups which are larger, and I want to
>>> assign a sequence value to consecutive rows where sumchild in not
>>> equal to 0.  As the data frame I'm working with is much larger, this
>>> goes up to 100 maybe even 200 and I have many different groups 20K+.
>>> I would like to do this for every group, not for the whole data frame.
>>>
>>> There is no particular science behind this, only data organizing.
>>>
>>> So just say we had data like so:
>>>
>>>    Dad Mum Child Group sumdad summum sumchild childseg
>>> 1:  AA  RR    RA     A      2      2        0        0
>>> 2:  AA  RR    RR     A      2      2        1        1
>>> 3:  AA  AA    AA     B      4      5        5        1
>>> 4:  AA  AA    RA     B      4      5        5        0
>>> 5:  RA  AA    RR     B      0      5        5        2
>>> 6:  RR  AA    RR     B      4      5        5        2
>>> 7:  AA  AA    AA     B      4      5        5        2
>>> 8:  AA  AA    AA     C      3      3        0        1
>>> 9:  AA  AA    RA     C      3      3        0        0
>>> 10:  AA  RR    RR     C      3      3        0        2
>>> 11:  AA  RR    RA     C     2      2        0        0
>>> 12:  AA  RR    RR     C      2      2        1        3
>>> 13:  AA  AA    AA     C      4      5        5        3
>>> 14:  AA  AA    RA     C      4      5        5        0
>>> 15:  RA  AA    RR     C      0      5        5        4
>>>
>>> On Fri, Jan 2, 2015 at 12:29 PM, David Winsemius [via R]
>>> <[hidden email]> wrote:
>>>>
>>>>
>>>> On Jan 2, 2015, at 12:07 AM, Kate Ignatius wrote:
>>>>
>>>>> Ah, crap.  Yep you're right.  This is not going too well. Okay - let
>>>>> me try that again:
>>>>>
>>>>> x$childseg<-0
>>>>> x<-x$sumchild !=0
>>>>
>>>>
>>>> That previous line would appear to overwrite the entire dataframe
>>>
>>> with the
>>>>
>>>> value of one vector
>>>>
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>>>>
>>>>> Does this one have any errors?
>>>>
>>>> Even assuming that the code from Jeff Newmiller is creating those
>>>
>>> objects I
>>>>
>>>> get
>>>>
>>>>> x$childseg[x]<-rep(seq_along(span), times = span)
>>>>
>>>> Error in `*tmp*`$childseg : $ operator is invalid for atomic vectors
>>>>
>>>> In the last line you are indexing a vector with a dataframe (or
>>>
>>> perhaps a
>>>>
>>>> data.table).
>>>>
>>>> If we use Newmiller's object and then change some of the instances of
>>>
>>> "x" in
>>>>
>>>> your code to DT we get:
>>>>
>>>>> DT$childseg<-0
>>>>> x<-DT$sumchild !=0  # Try not to overwrite your data-objects
>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>> DT$childseg[x]<-rep(seq_along(span), times = span)
>>>>> DT
>>>>
>>>>     Dad Mum Child Group sumdad summum sumchild childseg
>>>>  1:  AA  RR    RA     A      2      2        0        0
>>>>  2:  AA  RR    RR     A      2      2        1        1
>>>>  3:  AA  AA    AA     B      4      5        5        1
>>>>  4:  AA  AA    AA     B      4      5        5        1
>>>>  5:  RA  AA    RR     B      0      5        5        1
>>>>  6:  RR  AA    RR     B      4      5        5        1
>>>>  7:  AA  AA    AA     B      4      5        5        1
>>>>  8:  AA  AA    RA     C      3      3        0        0
>>>>  9:  AA  AA    RA     C      3      3        0        0
>>>> 10:  AA  RR    RA     C      3      3        0        0
>>>>
>>>> You persist in posting code where you do not explain what you are
>>>
>>> trying to
>>>>
>>>> do with it. You have already been told that your earlier efforts
>>>
>>> using `rle`
>>>>
>>>> did not make any sense. Post a complete example and then explain what
>>>
>>> you
>>>>
>>>> desire as an object. It's often helpful to provide a scientific
>>>
>>> background
>>>>
>>>> for what the data represents.
>>>>
>>>> --
>>>> David.
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 2, 2015 at 2:32 AM, David Winsemius <[hidden email]>
>>>
>>> wrote:
>>>>>>
>>>>>>
>>>>>>> On Jan 1, 2015, at 5:07 PM, Kate Ignatius <[hidden email]> wrote:
>>>>>>>
>>>>>>> Apologies - mix up of syntax all over the place, a habit of mine.
>>>
>>> The
>>>>>>>
>>>>>>> last line was in there because of code beforehand so it really
>>>
>>> doesn't
>>>>>>>
>>>>>>> need to be there.  Here is the proper code I hope:
>>>>>>>
>>>>>>> childseg<-0
>>>>>>> x<-sumchild ==0
>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE]
>>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>>
>>>>>>
>>>>>> This remains not reproducible. We have no idea what sumchild might
>>>
>>> be and
>>>>>>
>>>>>> the code throws an error. My guess is that you are trying to get a
>>>
>>> result
>>>>>>
>>>>>> such as would be delivered by:
>>>>>>
>>>>>> childseg <- sumchild[ sumchild != 0 ]
>>>>>>
>>>>>> ?
>>>>>> David.
>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller
>>>>>>> <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Thank you for attempting to encode what you want using R syntax,
>>>
>>> but
>>>>>>>>
>>>>>>>> you are not really succeeding yet (too many errors). Perhaps
>>>
>>> another hand
>>>>>>>>
>>>>>>>> generated result would help? A new input data frame might or
>>>
>>> might not be
>>>>>>>>
>>>>>>>> needed to illustrate desired results.
>>>>>>>>
>>>>>>>> Your second and third lines are  syntactically incorrect, and I
>>>
>>> don't
>>>>>>>>
>>>>>>>> understand what you hope to accomplish by assigning an empty
>>>
>>> string to a
>>>>>>>>
>>>>>>>> numeric in your last line.
>>>>>>>>
>>>>>>>>
>>>
>>> ---------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> Jeff Newmiller                        The     .....       .....
>>>
>>> Go
>>>>>>>>
>>>>>>>> Live...
>>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>
>>> Playing
>>>>>>>>
>>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>>>
>>> with
>>>>>>>>
>>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>>> rocks...1k
>>>>>>>>
>>>>>>>>
>>>
>>> ---------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> Sent from my phone. Please excuse my brevity.
>>>>>>>>
>>>>>>>> On January 1, 2015 4:16:52 AM PST, Kate Ignatius <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Is it possible to add the following code or similar in
>>>
>>> data.table:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> childseg<-0
>>>>>>>>> x:=sumchild <-0
>>>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE
>>>>>>>>> childseg[x]<-rep(seq_along(span), times = span)
>>>>>>>>> childseg[childseg == 0]<-''
>>>>>>>>>
>>>>>>>>> I was hoping to do this code by Group for mum, dad and
>>>>>>>>> child.  The problem I'm having is with the
>>>>>>>>> span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure
>>>
>>> can
>>>>>>>>>
>>>>>>>>> be added to data.table.
>>>>>>>>>
>>>>>>>>> [Previous email had incorrect code]
>>>>>>>>>
>>>>>>>>> On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> I do not understand the value of using the rle function in your
>>>>>>>>>
>>>>>>>>> description,
>>>>>>>>>>
>>>>>>>>>> but the code below appears to produce the table you want.
>>>>>>>>>>
>>>>>>>>>> Note that better support for the data.table package might be
>>>
>>> found at
>>>>>>>>>>
>>>>>>>>>> stackexchange as the documentation specifies.
>>>>>>>>>>
>>>>>>>>>> x <- read.table( text=
>>>>>>>>>> "Dad Mum Child Group
>>>>>>>>>> AA RR RA A
>>>>>>>>>> AA RR RR A
>>>>>>>>>> AA AA AA B
>>>>>>>>>> AA AA AA B
>>>>>>>>>> RA AA RR B
>>>>>>>>>> RR AA RR B
>>>>>>>>>> AA AA AA B
>>>>>>>>>> AA AA RA C
>>>>>>>>>> AA AA RA C
>>>>>>>>>> AA RR RA C
>>>>>>>>>> ", header=TRUE, stringsAsFactors=FALSE )
>>>>>>>>>>
>>>>>>>>>> library(data.table)
>>>>>>>>>> DT <- data.table( x )
>>>>>>>>>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ]
>>>>>>>>>> DT[ , sumdad := 0L ]
>>>>>>>>>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ]
>>>>>>>>>> DT[ , cdad := NULL ]
>>>>>>>>>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ]
>>>>>>>>>> DT[ , summum := 0L ]
>>>>>>>>>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ]
>>>>>>>>>> DT[ , cmum := NULL ]
>>>>>>>>>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ]
>>>>>>>>>> DT[ , sumchild := 0L ]
>>>>>>>>>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ]
>>>>>>>>>> DT[ , cchild := NULL ]
>>>>>>>>>>
>>>>>>>>>>> DT
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>   Dad Mum Child Group sumdad summum sumchild
>>>>>>>>>> 1:  AA  RR    RA     A      2      2        0
>>>>>>>>>> 2:  AA  RR    RR     A      2      2        1
>>>>>>>>>> 3:  AA  AA    AA     B      4      5        5
>>>>>>>>>> 4:  AA  AA    AA     B      4      5        5
>>>>>>>>>> 5:  RA  AA    RR     B      0      5        5
>>>>>>>>>> 6:  RR  AA    RR     B      4      5        5
>>>>>>>>>> 7:  AA  AA    AA     B      4      5        5
>>>>>>>>>> 8:  AA  AA    RA     C      3      3        0
>>>>>>>>>> 9:  AA  AA    RA     C      3      3        0
>>>>>>>>>> 10:  AA  RR    RA     C      3      3        0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 30 Dec 2014, Kate Ignatius wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm trying to use both these packages and wondering whether
>>>
>>> they are
>>>>>>>>>>>
>>>>>>>>>>> possible...
>>>>>>>>>>>
>>>>>>>>>>> To make this simple, my ultimate goal is determine long
>>>
>>> stretches of
>>>>>>>>>>>
>>>>>>>>>>> 1s, but I want to do this within groups (hence using the
>>>
>>> data.table
>>>>>>>>>
>>>>>>>>> as
>>>>>>>>>>>
>>>>>>>>>>> I use the "set key" option.  However, I'm I'm not having much
>>>
>>> luck
>>>>>>>>>>>
>>>>>>>>>>> making this possible.
>>>>>>>>>>>
>>>>>>>>>>> For example, for simplistic sake, I have the following data:
>>>>>>>>>>>
>>>>>>>>>>> Dad Mum Child Group
>>>>>>>>>>> AA RR RA A
>>>>>>>>>>> AA RR RR A
>>>>>>>>>>> AA AA AA B
>>>>>>>>>>> AA AA AA B
>>>>>>>>>>> RA AA RR B
>>>>>>>>>>> RR AA RR B
>>>>>>>>>>> AA AA AA B
>>>>>>>>>>> AA AA RA C
>>>>>>>>>>> AA AA RA C
>>>>>>>>>>> AA RR RA  C
>>>>>>>>>>>
>>>>>>>>>>> And the following code which I know works
>>>>>>>>>>>
>>>>>>>>>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")
>>>>>>>>>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1]
>>>>>>>>>>>
>>>>>>>>>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")
>>>>>>>>>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1]
>>>>>>>>>>>
>>>>>>>>>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")
>>>>>>>>>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1]
>>>>>>>>>>>
>>>>>>>>>>> However, I wish to do the above code by Group (though this
>>>
>>> file is
>>>>>>>>>>>
>>>>>>>>>>> millions of rows long and groups will be larger but just
>>>
>>> wanted to
>>>>>>>>>>>
>>>>>>>>>>> simply the example).
>>>>>>>>>>>
>>>>>>>>>>> I did something like this but of course I got an error:
>>>>>>>>>>>
>>>>>>>>>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")]
>>>>>>>>>>>
>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group]
>>>>>>>>>>>
>>>>>>>>>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")]
>>>>>>>>>>>
>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group]
>>>>>>>>>>>
>>>>>>>>>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")]
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The reason being as I want to eventually have something like
>>>
>>> this:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dad Mum Child Group sumdad summum sumchild
>>>>>>>>>>> AA RR RA A 2 2 0
>>>>>>>>>>> AA RR RR A 2 2 1
>>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>>> RA AA RR B 0 5 5
>>>>>>>>>>> RR AA RR B 4 5 5
>>>>>>>>>>> AA AA AA B 4 5 5
>>>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>>>> AA AA RA C 3 3 0
>>>>>>>>>>> AA RR RA  C 3 3 0
>>>>>>>>>>>
>>>>>>>>>>> That is, I would like to have the specific counts next to what
>>>
>>> I'm
>>>>>>>>>>>
>>>>>>>>>>> consecutively counting per group.  So for Group A for dad
>>>
>>> there are
>>>>>>>>>
>>>>>>>>> 2
>>>>>>>>>>>
>>>>>>>>>>> AAs,  there are two RRs for mum but only 1 AA or RR for the
>>>
>>> child
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>> that is RR (so the 1 is next to the RR and not the RA).
>>>>>>>>>>>
>>>>>>>>>>> Can this be done?
>>>>>>>>>>>
>>>>>>>>>>> K.
>>>>>>>>>>>
>>>>>>>>>>> ______________________________________________
>>>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>
>>> code.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>
>>> ---------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> Jeff Newmiller                        The     .....       .....
>>>
>>> Go
>>>>>>>>>
>>>>>>>>> Live...
>>>>>>>>>>
>>>>>>>>>> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live
>>>>>>>>>
>>>>>>>>> Go...
>>>>>>>>>>
>>>>>>>>>>                                     Live:   OO#.. Dead: OO#..
>>>>>>>>>
>>>>>>>>> Playing
>>>>>>>>>>
>>>>>>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>>>
>>> with
>>>>>>>>>>
>>>>>>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>>>>>>
>>>>>>>>> rocks...1k
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>
>>> ---------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>
>>> http://www.R-project.org/posting-guide.html
>>>>
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> ________________________________
>>>> If you reply to this email, your message will be added to the
>>>
>>> discussion
>>>>
>>>> below:
>>>>
>>>
>>> http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701316.html
>>>>
>>>> To unsubscribe from rle with data.table - is it possible?, click
>>>
>>> here.
>>>>
>>>> NAML
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
>>> http://r.789695.n4.nabble.com/rle-with-data-table-is-it-possible-tp4701211p4701332.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.