

Dear Rhelpers,
I need to count the maximum number of consecutive zero values of a variable
in a dataframe by different groups. My dataframe looks like this:
ID < c(1,1,1,2,2,3,3,3,3)
x < c(1,0,0,0,0,1,1,0,1)
df < data.frame(ID=ID,x=x)
rm(ID,x)
So I want to get the max number of consecutive zeros of variable x for each
ID. I found rle() to be helpful for this task; so I did:
FUN < function(x) {
rles < rle(x == 0)
}
consec < lapply(split(df[,2],df[,1]), FUN)
consec is now an rle object containing lists für each ID that contain
$lenghts: int as the counts for every consecutive number and $values: logi
indicating if the consecutive numbers are zero or not.
Unfortunately I'm not very experienced with lists. Could you help me how to
extract the max number of consec zeros for each ID and return the result as
a dataframe containing ID and max number of consecutive zeros?
Different approaches are also welcome. Since the real dataframe is quite
large, a fast solution is appreciated.
Best regards,
Carlos


Carlos Nasher
Buchenstr. 12
22299 Hamburg
tel: +49 (0)40 67952962
mobil: +49 (0)175 9386725
mail: [hidden email]
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


> Original Message
> So I want to get the max number of consecutive zeros of variable x for each
> ID. I found rle() to be helpful for this task; so I did:
>
> FUN < function(x) {
> rles < rle(x == 0)
> }
> consec < lapply(split(df[,2],df[,1]), FUN)
You're probably better off with tapply and a function that returns what you want. You're probably also better off with a data frame name that isn't a function name, so I'll use dfr instead of df...
dfr< data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups numbered 15, equal size but that doesn't matter for tapply
f2 < function(x) {
max( rle(x == 0)$lengths )
}
with(dfr, tapply(x, ID, f2))
S Ellison
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi,
May be this helps:
fun1 < function(dat){
lst1 < lapply(split(dat,dat$ID),function(y){
rl < rle(y$x)
data.frame(ID=unique(y$ID),MAXZero=max(rl$lengths[rl$values==0]))
})
do.call(rbind,lst1)
}
fun1(df)
# ID MAXZero
#1 1 2
#2 2 2
#3 3 1
A.K.
On Thursday, October 31, 2013 7:22 AM, Carlos Nasher < [hidden email]> wrote:
Dear Rhelpers,
I need to count the maximum number of consecutive zero values of a variable
in a dataframe by different groups. My dataframe looks like this:
ID < c(1,1,1,2,2,3,3,3,3)
x < c(1,0,0,0,0,1,1,0,1)
df < data.frame(ID=ID,x=x)
rm(ID,x)
So I want to get the max number of consecutive zeros of variable x for each
ID. I found rle() to be helpful for this task; so I did:
FUN < function(x) {
rles < rle(x == 0)
}
consec < lapply(split(df[,2],df[,1]), FUN)
consec is now an rle object containing lists für each ID that contain
$lenghts: int as the counts for every consecutive number and $values: logi
indicating if the consecutive numbers are zero or not.
Unfortunately I'm not very experienced with lists. Could you help me how to
extract the max number of consec zeros for each ID and return the result as
a dataframe containing ID and max number of consecutive zeros?
Different approaches are also welcome. Since the real dataframe is quite
large, a fast solution is appreciated.
Best regards,
Carlos


Carlos Nasher
Buchenstr. 12
22299 Hamburg
tel: +49 (0)40 67952962
mobil: +49 (0)175 9386725
mail: [hidden email]
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


If I apply your function to my test data:
ID < c(1,1,1,2,2,3,3,3,3)
x < c(1,0,0,0,0,1,1,0,1)
data < data.frame(ID=ID,x=x)
rm(ID,x)
f2 < function(x) {
max( rle(x == 0)$lengths )
}
with(data, tapply(x, ID, f2))
the result is
1 2 3
2 2 2
which is not what I'm aiming for. It should be
1 2 3
2 2 1
I think f2 does not return the max of consecutive zeros, but the max of any
consecutve number... Any idea how to fix this?
2013/10/31 S Ellison < [hidden email]>
>
>
> > Original Message
> > So I want to get the max number of consecutive zeros of variable x for
> each
> > ID. I found rle() to be helpful for this task; so I did:
> >
> > FUN < function(x) {
> > rles < rle(x == 0)
> > }
> > consec < lapply(split(df[,2],df[,1]), FUN)
>
> You're probably better off with tapply and a function that returns what
> you want. You're probably also better off with a data frame name that isn't
> a function name, so I'll use dfr instead of df...
>
> dfr< data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups numbered
> 15, equal size but that doesn't matter for tapply
>
> f2 < function(x) {
> max( rle(x == 0)$lengths )
> }
> with(dfr, tapply(x, ID, f2))
>
>
> S Ellison
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:24}}
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


> If I apply your function to my test data:
>
....
> the result is
> 1 2 3
> 2 2 2
>
...
> I think f2 does not return the max of consecutive zeros, but the max of any
> consecutve number... Any idea how to fix this?
The toy example of tapply using f2 does indeed return the maximum run lengths irrespective of the value repeated.
If you want to select runs of a particular value, you can select according to use $values element of the rle object, again inside the function.
Modifying to accommodate that (and again avoiding a data frame name the same as a base R function name  you managed it again!):
dfr < data.frame(ID = c(1,1,1,2,2,3,3,3,3), x = c(1,0,0,0,0,1,1,0,1))
f3 < function(x) {
runs < rle(x == 0L) #Often wise to be careful with == and numbers ... see FAQ 7.31
with(runs, max(lengths[values]))
#This works because in this case the values in
#$values are TRUE for x==0 and FALSE otherwise; see ?'[' for why those work
}
with(dfr, tapply(x, ID, f3))
or, more or less equivalently but a shade more generally
f4 < function(x, select=0L) {
runs < rle(x )
with(runs, max(lengths[values == select]))
}
with(dfr, tapply(x, ID, f4))
None of this checks that runs of zero exist in a group; if they don't, you'll get warnings and Inf in the output as max takes maxima of nothing. You can add extra checks inside the function if that bothers you.
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Carlos,
With Bioconductor, this can simply be done with:
library(IRanges)
ID < Rle(1:3, c(3,2,4))
x < Rle(c(1,0,0,0,0,1,1,0,1))
groups < split(x, ID)
idx < groups == 0
Then:
> max(runLength(idx)[runValue(idx)])
1 2 3
2 2 1
Should be fast even with hundreds of thousands of groups (should take
< 10 sec).
HTH,
H.
On 10/31/2013 04:20 AM, Carlos Nasher wrote:
> Dear Rhelpers,
>
> I need to count the maximum number of consecutive zero values of a variable
> in a dataframe by different groups. My dataframe looks like this:
>
> ID < c(1,1,1,2,2,3,3,3,3)
> x < c(1,0,0,0,0,1,1,0,1)
> df < data.frame(ID=ID,x=x)
> rm(ID,x)
>
> So I want to get the max number of consecutive zeros of variable x for each
> ID. I found rle() to be helpful for this task; so I did:
>
> FUN < function(x) {
> rles < rle(x == 0)
> }
> consec < lapply(split(df[,2],df[,1]), FUN)
>
> consec is now an rle object containing lists für each ID that contain
> $lenghts: int as the counts for every consecutive number and $values: logi
> indicating if the consecutive numbers are zero or not.
>
> Unfortunately I'm not very experienced with lists. Could you help me how to
> extract the max number of consec zeros for each ID and return the result as
> a dataframe containing ID and max number of consecutive zeros?
>
> Different approaches are also welcome. Since the real dataframe is quite
> large, a fast solution is appreciated.
>
> Best regards,
> Carlos
>
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>

Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1B514
P.O. Box 19024
Seattle, WA 981091024
Email: [hidden email]
Phone: (206) 6675791
Fax: (206) 6671319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


> None of this checks that runs of zero exist in a group; if they don't, you'll get warnings
> and Inf in the output as max takes maxima of nothing. You can add extra checks inside
> the function if that bothers you.
Just adding a second argument, 0, to the call to max() will take care of that.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> Original Message
> From: [hidden email] [mailto: [hidden email]] On Behalf
> Of S Ellison
> Sent: Thursday, October 31, 2013 11:27 AM
> To: Carlos Nasher; [hidden email]
> Subject: Re: [R] Count number of consecutive zeros by group
>
> > If I apply your function to my test data:
> >
> ....
> > the result is
> > 1 2 3
> > 2 2 2
> >
> ...
> > I think f2 does not return the max of consecutive zeros, but the max of any
> > consecutve number... Any idea how to fix this?
>
> The toy example of tapply using f2 does indeed return the maximum run lengths
> irrespective of the value repeated.
> If you want to select runs of a particular value, you can select according to use $values
> element of the rle object, again inside the function.
> Modifying to accommodate that (and again avoiding a data frame name the same as a
> base R function name  you managed it again!):
>
> dfr < data.frame(ID = c(1,1,1,2,2,3,3,3,3), x = c(1,0,0,0,0,1,1,0,1))
>
> f3 < function(x) {
> runs < rle(x == 0L) #Often wise to be careful with == and numbers ... see FAQ 7.31
> with(runs, max(lengths[values]))
> #This works because in this case the values in
> #$values are TRUE for x==0 and FALSE otherwise; see ?'[' for why those work
> }
> with(dfr, tapply(x, ID, f3))
>
> or, more or less equivalently but a shade more generally
>
> f4 < function(x, select=0L) {
> runs < rle(x )
> with(runs, max(lengths[values == select]))
> }
> with(dfr, tapply(x, ID, f4))
>
> None of this checks that runs of zero exist in a group; if they don't, you'll get warnings
> and Inf in the output as max takes maxima of nothing. You can add extra checks inside
> the function if that bothers you.
>
>
>
>
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi
Another option is sapply/split/sum construction
with(data, sapply(split(x, ID), function(x) sum(x==0)))
Regards
Petr
> Original Message
> From: [hidden email] [mailto:rhelpbounces@r
> project.org] On Behalf Of Carlos Nasher
> Sent: Thursday, October 31, 2013 6:46 PM
> To: S Ellison
> Cc: [hidden email]
> Subject: Re: [R] Count number of consecutive zeros by group
>
> If I apply your function to my test data:
>
> ID < c(1,1,1,2,2,3,3,3,3)
> x < c(1,0,0,0,0,1,1,0,1)
> data < data.frame(ID=ID,x=x)
> rm(ID,x)
>
> f2 < function(x) {
> max( rle(x == 0)$lengths )
> }
> with(data, tapply(x, ID, f2))
>
> the result is
> 1 2 3
> 2 2 2
>
> which is not what I'm aiming for. It should be
> 1 2 3
> 2 2 1
>
> I think f2 does not return the max of consecutive zeros, but the max of
> any consecutve number... Any idea how to fix this?
>
>
> 2013/10/31 S Ellison < [hidden email]>
>
> >
> >
> > > Original Message
> > > So I want to get the max number of consecutive zeros of variable x
> > > for
> > each
> > > ID. I found rle() to be helpful for this task; so I did:
> > >
> > > FUN < function(x) {
> > > rles < rle(x == 0)
> > > }
> > > consec < lapply(split(df[,2],df[,1]), FUN)
> >
> > You're probably better off with tapply and a function that returns
> > what you want. You're probably also better off with a data frame name
> > that isn't a function name, so I'll use dfr instead of df...
> >
> > dfr< data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups
> > numbered 15, equal size but that doesn't matter for tapply
> >
> > f2 < function(x) {
> > max( rle(x == 0)$lengths )
> > }
> > with(dfr, tapply(x, ID, f2))
> >
> >
> > S Ellison
> >
> >
> > *******************************************************************
> > This email and any attachments are confidential. Any
> > u...{{dropped:24}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posting> guide.html
> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


I think this gives a different result than the one OP asked for:
df1 < structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), x = c(1, 0,
0, 1, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0)), .Names = c("ID",
"x"), row.names = c(NA, 22L), class = "data.frame")
with(df1, sapply(split(x, ID), function(x) sum(x==0)))
with(df1,tapply(x,list(ID),function(y) {rl < rle(!y); max(c(0,rl$lengths[rl$values]))}))
A.K.
On Friday, November 1, 2013 6:01 AM, PIKAL Petr < [hidden email]> wrote:
Hi
Another option is sapply/split/sum construction
with(data, sapply(split(x, ID), function(x) sum(x==0)))
Regards
Petr
> Original Message
> From: [hidden email] [mailto:rhelpbounces@r
> project.org] On Behalf Of Carlos Nasher
> Sent: Thursday, October 31, 2013 6:46 PM
> To: S Ellison
> Cc: [hidden email]
> Subject: Re: [R] Count number of consecutive zeros by group
>
> If I apply your function to my test data:
>
> ID < c(1,1,1,2,2,3,3,3,3)
> x < c(1,0,0,0,0,1,1,0,1)
> data < data.frame(ID=ID,x=x)
> rm(ID,x)
>
> f2 < function(x) {
> max( rle(x == 0)$lengths )
> }
> with(data, tapply(x, ID, f2))
>
> the result is
> 1 2 3
> 2 2 2
>
> which is not what I'm aiming for. It should be
> 1 2 3
> 2 2 1
>
> I think f2 does not return the max of consecutive zeros, but the max of
> any consecutve number... Any idea how to fix this?
>
>
> 2013/10/31 S Ellison < [hidden email]>
>
> >
> >
> > > Original Message
> > > So I want to get the max number of consecutive zeros of variable x
> > > for
> > each
> > > ID. I found rle() to be helpful for this task; so I did:
> > >
> > > FUN < function(x) {
> > > rles < rle(x == 0)
> > > }
> > > consec < lapply(split(df[,2],df[,1]), FUN)
> >
> > You're probably better off with tapply and a function that returns
> > what you want. You're probably also better off with a data frame name
> > that isn't a function name, so I'll use dfr instead of df...
> >
> > dfr< data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups
> > numbered 15, equal size but that doesn't matter for tapply
> >
> > f2 < function(x) {
> > max( rle(x == 0)$lengths )
> > }
> > with(dfr, tapply(x, ID, f2))
> >
> >
> > S Ellison
> >
> >
> > *******************************************************************
> > This email and any attachments are confidential. Any
> > u...{{dropped:24}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posting> guide.html
> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi
Yes you are right. This gives number of zeroes not max number of consecutive zeroes.
Regards
Petr
> Original Message
> From: arun [mailto: [hidden email]]
> Sent: Friday, November 01, 2013 2:17 PM
> To: R help
> Cc: PIKAL Petr; Carlos Nasher
> Subject: Re: [R] Count number of consecutive zeros by group
>
> I think this gives a different result than the one OP asked for:
>
> df1 < structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), x = c(1, 0, 0, 1, 0,
> 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0)), .Names = c("ID",
> "x"), row.names = c(NA, 22L), class = "data.frame")
>
> with(df1, sapply(split(x, ID), function(x) sum(x==0)))
>
> with(df1,tapply(x,list(ID),function(y) {rl < rle(!y);
> max(c(0,rl$lengths[rl$values]))}))
>
>
> A.K.
>
>
> On Friday, November 1, 2013 6:01 AM, PIKAL Petr
> < [hidden email]> wrote:
> Hi
>
> Another option is sapply/split/sum construction
>
> with(data, sapply(split(x, ID), function(x) sum(x==0)))
>
> Regards
> Petr
>
>
> > Original Message
> > From: [hidden email] [mailto:rhelpbounces@r
> > project.org] On Behalf Of Carlos Nasher
> > Sent: Thursday, October 31, 2013 6:46 PM
> > To: S Ellison
> > Cc: [hidden email]
> > Subject: Re: [R] Count number of consecutive zeros by group
> >
> > If I apply your function to my test data:
> >
> > ID < c(1,1,1,2,2,3,3,3,3)
> > x < c(1,0,0,0,0,1,1,0,1)
> > data < data.frame(ID=ID,x=x)
> > rm(ID,x)
> >
> > f2 < function(x) {
> > max( rle(x == 0)$lengths )
> > }
> > with(data, tapply(x, ID, f2))
> >
> > the result is
> > 1 2 3
> > 2 2 2
> >
> > which is not what I'm aiming for. It should be
> > 1 2 3
> > 2 2 1
> >
> > I think f2 does not return the max of consecutive zeros, but the max
> > of any consecutve number... Any idea how to fix this?
> >
> >
> > 2013/10/31 S Ellison < [hidden email]>
> >
> > >
> > >
> > > > Original Message
> > > > So I want to get the max number of consecutive zeros of variable
> x
> > > > for
> > > each
> > > > ID. I found rle() to be helpful for this task; so I did:
> > > >
> > > > FUN < function(x) {
> > > > rles < rle(x == 0)
> > > > }
> > > > consec < lapply(split(df[,2],df[,1]), FUN)
> > >
> > > You're probably better off with tapply and a function that returns
> > > what you want. You're probably also better off with a data frame
> > > name that isn't a function name, so I'll use dfr instead of df...
> > >
> > > dfr< data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups
> > > numbered 15, equal size but that doesn't matter for tapply
> > >
> > > f2 < function(x) {
> > > max( rle(x == 0)$lengths )
> > > }
> > > with(dfr, tapply(x, ID, f2))
> > >
> > >
> > > S Ellison
> > >
> > >
> > > *******************************************************************
> > > This email and any attachments are confidential. Any
> > > u...{{dropped:24}}
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide http://www.Rproject.org/posting> > guide.html and provide commented, minimal, selfcontained,
> > reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/posting> guide.html
> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

