inefficient for loop, is there a better way?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

inefficient for loop, is there a better way?

emorway
The code below is a small reproducible example of a much larger problem.
While the script below works, it is really slow on the true dataset with
many more rows and columns.  I'm hoping to get the same result to examp,
but with significant time savings.

The example below is setting up a data.frame for an ensuing regression
analysis.  The purpose of the script below is to appends columns to 'examp'
that contain values corresponding to the total number of days in the
previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a
faster method that leverages existing R functionality?  I feel like the
hack below is pretty clunky and can be sped up on the true dataset.  I
would like to run a more efficient script many times adjusting the value of
'per'.

ts <- 1:1000
examp <- data.frame(ts=ts, stage=sin(ts))

hi1 <- list()
hi2 <- list()
per <- 7
elev1 <- 0.6
elev2 <- 0.85
for(i in per:nrow(examp)){
    examp_per <- examp[seq(i - (per - 1), i, by=1),]
    stg_hi_cond1 <- subset(examp_per, examp_per$stage > elev1)
    stg_hi_cond2 <- subset(examp_per, examp_per$stage > elev2)

    hi1 <- c(hi1, nrow(stg_hi_cond1))
    hi2 <- c(hi2, nrow(stg_hi_cond2))
}
examp$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1))
examp$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: inefficient for loop, is there a better way?

R help mailing list-2
Try using stats::filter (not the unfortunately named dplyr::filter, which
is entirely different).
state>elev is a logical vector, but filter(), like most numerical
functions, treats TRUEs as 1s
and FALSEs as 0s.

E.g.,

> str( stats::filter( x=examp$stage>elev1, filter=rep(1,7),
method="convolution", sides=1) )
 Time-Series [1:1000] from 1 to 1000: NA NA NA NA NA NA 3 3 2 2 ...
> str( stats::filter( x=examp$stage>elev2, filter=rep(1,7),
method="convolution", sides=1) )
 Time-Series [1:1000] from 1 to 1000: NA NA NA NA NA NA 1 2 1 1 ...


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Dec 12, 2017 at 5:36 PM, Morway, Eric <[hidden email]> wrote:

> The code below is a small reproducible example of a much larger problem.
> While the script below works, it is really slow on the true dataset with
> many more rows and columns.  I'm hoping to get the same result to examp,
> but with significant time savings.
>
> The example below is setting up a data.frame for an ensuing regression
> analysis.  The purpose of the script below is to appends columns to 'examp'
> that contain values corresponding to the total number of days in the
> previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a
> faster method that leverages existing R functionality?  I feel like the
> hack below is pretty clunky and can be sped up on the true dataset.  I
> would like to run a more efficient script many times adjusting the value of
> 'per'.
>
> ts <- 1:1000
> examp <- data.frame(ts=ts, stage=sin(ts))
>
> hi1 <- list()
> hi2 <- list()
> per <- 7
> elev1 <- 0.6
> elev2 <- 0.85
> for(i in per:nrow(examp)){
>     examp_per <- examp[seq(i - (per - 1), i, by=1),]
>     stg_hi_cond1 <- subset(examp_per, examp_per$stage > elev1)
>     stg_hi_cond2 <- subset(examp_per, examp_per$stage > elev2)
>
>     hi1 <- c(hi1, nrow(stg_hi_cond1))
>     hi2 <- c(hi2, nrow(stg_hi_cond2))
> }
> examp$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1))
> examp$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: inefficient for loop, is there a better way?

YRichard
In reply to this post by emorway
One way of doing it with data.table. It seems to scale up pretty well.
It takes 4 seconds on my computer with ts <- 1:1e6.

library(data.table)
per <- 7
elev1 <- 0.6
elev2 <- 0.85

ts <- 1:1000

examp <- data.table(ts=ts, stage=sin(ts))
examp[, `:=`(days_abv_0.6_in_last_7  = apply(do.call('cbind',
shift(stage, 1:per)), 1, function(x) sum(x > elev1)),
                   days_abv_0.85_in_last_7 = apply(do.call('cbind',
shift(stage, 1:per)), 1, function(x) sum(x > elev2)))]



On 13 December 2017 at 14:36, Morway, Eric <[hidden email]> wrote:

> The code below is a small reproducible example of a much larger problem.
> While the script below works, it is really slow on the true dataset with
> many more rows and columns.  I'm hoping to get the same result to examp,
> but with significant time savings.
>
> The example below is setting up a data.frame for an ensuing regression
> analysis.  The purpose of the script below is to appends columns to 'examp'
> that contain values corresponding to the total number of days in the
> previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a
> faster method that leverages existing R functionality?  I feel like the
> hack below is pretty clunky and can be sped up on the true dataset.  I
> would like to run a more efficient script many times adjusting the value of
> 'per'.
>
> ts <- 1:1000
> examp <- data.frame(ts=ts, stage=sin(ts))
>
> hi1 <- list()
> hi2 <- list()
> per <- 7
> elev1 <- 0.6
> elev2 <- 0.85
> for(i in per:nrow(examp)){
>     examp_per <- examp[seq(i - (per - 1), i, by=1),]
>     stg_hi_cond1 <- subset(examp_per, examp_per$stage > elev1)
>     stg_hi_cond2 <- subset(examp_per, examp_per$stage > elev2)
>
>     hi1 <- c(hi1, nrow(stg_hi_cond1))
>     hi2 <- c(hi2, nrow(stg_hi_cond2))
> }
> examp$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1))
> examp$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Yvan Richard, PhD
Environmental data scientist



Physical address: Level 4, 158 Victoria St, Te Aro, Wellington, New Zealand
Postal address: PO Box 27535, Wellington 6141, New Zealand
Phone: 022 643 7881

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: inefficient for loop, is there a better way?

Bert Gunter-2
In reply to this post by emorway
I believe ?filter will do what you want.

I used  n = 100 instead of 1000:

ts <- 1:100
examp <- data.frame(ts=ts, stage=sin(ts))
examp <- within(examp, {
  abv_1 <- filter(stage > 0.6, rep(1,7),sides =1)
  abv_2 <- filter(stage > .85, rep(1,7), sides =1)
   })
examp

I think this should be fairly fast, but let us know if not. There may be
other alternatives that might be faster.
Assuming it does what you wanted, of course.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Dec 12, 2017 at 5:36 PM, Morway, Eric <[hidden email]> wrote:

> The code below is a small reproducible example of a much larger problem.
> While the script below works, it is really slow on the true dataset with
> many more rows and columns.  I'm hoping to get the same result to examp,
> but with significant time savings.
>
> The example below is setting up a data.frame for an ensuing regression
> analysis.  The purpose of the script below is to appends columns to 'examp'
> that contain values corresponding to the total number of days in the
> previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a
> faster method that leverages existing R functionality?  I feel like the
> hack below is pretty clunky and can be sped up on the true dataset.  I
> would like to run a more efficient script many times adjusting the value of
> 'per'.
>
> ts <- 1:1000
> examp <- data.frame(ts=ts, stage=sin(ts))
>
> hi1 <- list()
> hi2 <- list()
> per <- 7
> elev1 <- 0.6
> elev2 <- 0.85
> for(i in per:nrow(examp)){
>     examp_per <- examp[seq(i - (per - 1), i, by=1),]
>     stg_hi_cond1 <- subset(examp_per, examp_per$stage > elev1)
>     stg_hi_cond2 <- subset(examp_per, examp_per$stage > elev2)
>
>     hi1 <- c(hi1, nrow(stg_hi_cond1))
>     hi2 <- c(hi2, nrow(stg_hi_cond2))
> }
> examp$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1))
> examp$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.