partial cumsum

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

partial cumsum

ml-5
Hello,

I am searching for a function to calculate "partial" cumsums.

For example it should calculate the cumulative sums until a NA appears,
and restart the cumsum calculation after the NA.

this:

x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)

should become this:

1 3 6 NA 5  11  18  26  35  45


any ideas?

thank you and best regards,

    stefan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

William Dunlap


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of smu
> Sent: Wednesday, November 11, 2009 7:58 AM
> To: [hidden email]
> Subject: [R] partial cumsum
>
> Hello,
>
> I am searching for a function to calculate "partial" cumsums.
>
> For example it should calculate the cumulative sums until a
> NA appears,
> and restart the cumsum calculation after the NA.
>
> this:
>
> x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)
>
> should become this:
>
> 1 3 6 NA 5  11  18  26  35  45

Perhaps
   > ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
    [1]  1  3  6 NA  5 11 18 26 35 45

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
 

> any ideas?
>
> thank you and best regards,
>
>     stefan
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

Karl Ove Hufthammer
On Wed, 11 Nov 2009 08:53:50 -0800 William Dunlap <[hidden email]>
wrote:
> > x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)
> >
> > should become this:
> >
> > 1 3 6 NA 5  11  18  26  35  45
>
> Perhaps
>    > ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
>     [1]  1  3  6 NA  5 11 18 26 35 45

Clever way of putting the NA values at the end of each group. (I had to
study it quite a bit to understand what was going on, and why! :-) )

I wish cumsum took a 'na.rm' argument, though. It would make stuff like
this much easier.

--
Karl Ove Hufthammer

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

Karl Ove Hufthammer
On Wed, 11 Nov 2009 18:11:51 +0100 Karl Ove Hufthammer <[hidden email]>
wrote:
> I wish cumsum took a 'na.rm' argument, though. It would make stuff like
> this much easier.

Or, more specifically, a 'na.value', which could perhaps default to
'NA' to get the current behaviour, but which one could set 0 for
'cumsum', 1 for 'cumprod', Inf for 'cummin' and -Inf for 'cummax'.

--
Karl Ove Hufthammer

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

ml-5
In reply to this post by William Dunlap
On Wed, Nov 11, 2009 at 08:53:50AM -0800, William Dunlap wrote:
>
> Perhaps
>    > ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
>     [1]  1  3  6 NA  5 11 18 26 35 45
>

it takes some time to understand how it works, but it's perfect.

thank you,
 stefan

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

William Dunlap
> -----Original Message-----
> From: smu [mailto:[hidden email]]
> Sent: Wednesday, November 11, 2009 9:26 AM
> To: William Dunlap
> Cc: [hidden email]
> Subject: Re: [R] partial cumsum
>
> On Wed, Nov 11, 2009 at 08:53:50AM -0800, William Dunlap wrote:
> >
> > Perhaps
> >    > ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
> >     [1]  1  3  6 NA  5 11 18 26 35 45
> >
>
> it takes some time to understand how it works, but it's perfect.

Note that the 2nd argument assigns a group number
based on the number of NA's prior to the current
position in the vector.  The odd repeated calls to
rev() are there to put the NA's at the ends of the
groups, instead of at the beginnings:
   > x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)
   > rev(cumsum(rev(is.na(x))))
    [1] 1 1 1 1 0 0 0 0 0 0
A more natural way to do this is
   > cumsum(is.na(c(NA,x[-length(x)])))
    [1] 1 1 1 1 2 2 2 2 2 2

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

>
> thank you,
>  stefan
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

Tony Plate-3
In reply to this post by William Dunlap
William Dunlap wrote:

>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com  
>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of smu
>> Sent: Wednesday, November 11, 2009 7:58 AM
>> To: [hidden email]
>> Subject: [R] partial cumsum
>>
>> Hello,
>>
>> I am searching for a function to calculate "partial" cumsums.
>>
>> For example it should calculate the cumulative sums until a
>> NA appears,
>> and restart the cumsum calculation after the NA.
>>
>> this:
>>
>> x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)
>>
>> should become this:
>>
>> 1 3 6 NA 5  11  18  26  35  45
>
> Perhaps
>    > ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
>     [1]  1  3  6 NA  5 11 18 26 35 45
>

Nice simple function!  Here's a different approach I use that's faster for long vectors with many NA values.  Note however, that this approach can suffer from catastrophic round-off error because it does a cumsum over the whole vector (after replacing NA's with zeros) and then subtracting out the cumsum at the most recent NA values.  Most of the body of this function is devoted to allowing (an unreasonable degree of) flexibility in specification of where to reset.

cumsum.reset <- function(x, reset.at=which(is.na(x)), na.rm=F) {
    # compute the cumsum of x, resetting the cumsum to 0 at each element indexed by reset.at
    if (is.logical(reset.at)) {
        if (length(reset.at)>length(x)) {
            if ((length(reset.at) %% length(x))!=0)
                stop("length of reset.at must be a multiple of length of x")
            x <- rep(x, len=length(reset.at))
        } else if (length(reset.at)<length(x)) {
            if ((length(x) %% length(reset.at))!=0)
                stop("length of x must be a multiple of length of reset.at")
            reset.at <- rep(reset.at, len=length(x))
        }
        reset.at <- which(reset.at)
    } else if (!is.numeric(reset.at)) {
        stop("reset.at must be logical or numeric")
    }
    if (length(i <- which(reset.at<=1)))
        reset.at <- reset.at[-i]
    if (any(i <- is.na(x[reset.at])))
        x[reset.at[i]] <- 0
    if (na.rm && any(i <- which(is.na(x))))
        x[i] <- 0
    y <- cumsum(x)
    d <- diff(c(0, y[reset.at-1]))
    r <- rep(0, length(x))
    r[reset.at] <- d
    return(y - cumsum(r))
}



> x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)
> cumsum.reset(x)
 [1]  1  3  6  0  5 11 18 26 35 45
> ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
 [1]  1  3  6 NA  5 11 18 26 35 45
>

The speedup from not breaking the input vector into smaller vectors is (to me) surprisingly small -- only a factor of 3:

> x <- replace(rnorm(1e6), sample(1e6, 10000), NA)
> all.equal(replace(ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum), is.na(x), 0), cumsum.reset(x))
[1] TRUE
> system.time(cumsum.reset(x))
   user  system elapsed
   0.31    0.03    0.35
> system.time(ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum))
   user  system elapsed
   0.99    0.05    1.15
>

So, I'd go with the ave() approach unless this type of cumsum is the core of a long computationally intensive job.  And if that's the case, it would make sense to code it in C.

-- Tony Plate

> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>  
>> any ideas?
>>
>> thank you and best regards,
>>
>>     stefan
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

Carl Witthoft
In reply to this post by ml-5
quote:
    > x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)
    > rev(cumsum(rev(is.na(x))))
     [1] 1 1 1 1 0 0 0 0 0 0
A more natural way to do this is
    > cumsum(is.na(c(NA,x[-length(x)])))
     [1] 1 1 1 1 2 2 2 2 2 2
endquote

Both of which suggest the original problem could also be dealt with by
using rle().  Something like

xna<-is.na(x)
rle(xna)
and then apply cumsum to sections of x based onthe lengths returned by rle

Carl

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

Marc Schwartz-3
On Nov 11, 2009, at 4:45 PM, Carl Witthoft wrote:

 > By Bill Dunlap:

> quote:
>   > x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)
>   > rev(cumsum(rev(is.na(x))))
>    [1] 1 1 1 1 0 0 0 0 0 0
> A more natural way to do this is
>   > cumsum(is.na(c(NA,x[-length(x)])))
>    [1] 1 1 1 1 2 2 2 2 2 2
> endquote
>
> Both of which suggest the original problem could also be dealt with  
> by using rle().  Something like
>
> xna<-is.na(x)
> rle(xna)
> and then apply cumsum to sections of x based onthe lengths returned  
> by rle
>
> Carl



Which would be something along the lines of the following:

rle.x <- rle(!is.na(x))$lengths

 > rle.x
[1] 3 1 6

 > as.numeric(unlist(sapply(split(x, rep(seq(along = rle.x), rle.x)),  
cumsum)))
  [1]  1  3  6 NA  5 11 18 26 35 45


Which is what I had been working on until I saw Bill's more elegant  
"one-liner" solution...  :-)

The interim use of split() gets you 'x' split by where the NA's occur:

 > rep(seq(along = rle.x), rle.x)
  [1] 1 1 1 2 3 3 3 3 3 3

 > split(x, rep(seq(along = rle.x), rle.x))
$`1`
[1] 1 2 3

$`2`
[1] NA

$`3`
[1]  5  6  7  8  9 10

and then you use cumsum() in sapply() (or lapply()) on each list  
element and coerce the result back to an un-named numeric vector.

Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: partial cumsum

mayouf.k
In reply to this post by Tony Plate-3
thank u very much Wiliam Dunlap,
your function is very helpful, i been trying to find the right algo,,,and i saw yours, it works perfectly.
i even checked with system.time function, and it's clear the cum.reset is the best.

> system.time(ave(J, rev(cumsum(rev(is.na(J)))), FUN=cumsum))
utilisateur     système      écoulé
       0.18        0.00        0.19
> system.time(cumsum.reset(J))
utilisateur     système      écoulé
          0           0           0

thks