# inefficient for loop, is there a better way?

 The code below is a small reproducible example of a much larger problem. While the script below works, it is really slow on the true dataset with many more rows and columns.  I'm hoping to get the same result to examp, but with significant time savings. The example below is setting up a data.frame for an ensuing regression analysis.  The purpose of the script below is to appends columns to 'examp' that contain values corresponding to the total number of days in the previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a faster method that leverages existing R functionality?  I feel like the hack below is pretty clunky and can be sped up on the true dataset.  I would like to run a more efficient script many times adjusting the value of 'per'. ts <- 1:1000 examp <- data.frame(ts=ts, stage=sin(ts)) hi1 <- list() hi2 <- list() per <- 7 elev1 <- 0.6 elev2 <- 0.85 for(i in per:nrow(examp)){     examp_per <- examp[seq(i - (per - 1), i, by=1),]     stg_hi_cond1 <- subset(examp_per, examp_per\$stage > elev1)     stg_hi_cond2 <- subset(examp_per, examp_per\$stage > elev2)     hi1 <- c(hi1, nrow(stg_hi_cond1))     hi2 <- c(hi2, nrow(stg_hi_cond2)) } examp\$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1)) examp\$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))
## Re: inefficient for loop, is there a better way?

 Try using stats::filter (not the unfortunately named dplyr::filter, which is entirely different). state>elev is a logical vector, but filter(), like most numerical functions, treats TRUEs as 1s and FALSEs as 0s. E.g., > str( stats::filter( x=examp\$stage>elev1, filter=rep(1,7), method="convolution", sides=1) )  Time-Series [1:1000] from 1 to 1000: NA NA NA NA NA NA 3 3 2 2 ... > str( stats::filter( x=examp\$stage>elev2, filter=rep(1,7), method="convolution", sides=1) )  Time-Series [1:1000] from 1 to 1000: NA NA NA NA NA NA 1 2 1 1 ... Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Dec 12, 2017 at 5:36 PM, Morway, Eric <[hidden email]> wrote: > The code below is a small reproducible example of a much larger problem. > While the script below works, it is really slow on the true dataset with > many more rows and columns.  I'm hoping to get the same result to examp, > but with significant time savings. > > The example below is setting up a data.frame for an ensuing regression > analysis.  The purpose of the script below is to appends columns to 'examp' > that contain values corresponding to the total number of days in the > previous 7 ('per') above some stage ('elev1' or 'elev2').  Is there a > faster method that leverages existing R functionality?  I feel like the > hack below is pretty clunky and can be sped up on the true dataset.  I > would like to run a more efficient script many times adjusting the value of > 'per'. > > ts <- 1:1000 > examp <- data.frame(ts=ts, stage=sin(ts)) > > hi1 <- list() > hi2 <- list() > per <- 7 > elev1 <- 0.6 > elev2 <- 0.85 > for(i in per:nrow(examp)){ >     examp_per <- examp[seq(i - (per - 1), i, by=1),] >     stg_hi_cond1 <- subset(examp_per, examp_per\$stage > elev1) >     stg_hi_cond2 <- subset(examp_per, examp_per\$stage > elev2) > >     hi1 <- c(hi1, nrow(stg_hi_cond1)) >     hi2 <- c(hi2, nrow(stg_hi_cond2)) > } > examp\$days_abv_0.6_in_last_7   <- c(rep(NA, times=per-1), unlist(hi1)) > examp\$days_abv_0.85_in_last_7  <- c(rep(NA, times=per-1), unlist(hi2))