Two libraries are needed to run the code you submitted ...

library(dplyr)

library(sqldf)

Your IsHigh() function and its use can be replaced by a single line of code

isHighFlow <- as.numeric(Flow>=1600)

You are getting the additional hour by using cumsum(). One date element

which you seem to characterize as zero hours returns a one in cumsum, two

returns two, etc.

cumsum(c(1, 0, 1, 1, 0, 1, 1, 1, 0))

If everything is off by one hour, just subtract a 1. Problem solved.

Jean

> I'm trying to study times in which flow was operating at a given level or

> greater. To do so I have created a way to see how long the series has

> operated at a high level. But for some reason the data is calculating the

> runs one hour to long. Any ideas on why?

> Code:

> Date<-format(seq(as.POSIXct("2014-01-01 01:00"), as.POSIXct("2015-01-01

> 00:00"), by="hour"), "%Y-%m-%d %H:%M", usetz = FALSE)

> Flow<-runif(8760, 0, 2300)

> IsHigh<- function(x ){

> if (x < 1600) return(0)

> if (1600 <= x) return(1)

> }

>

> isHighFlow = unlist(lapply(Flow, IsHigh))

>

> df = data.frame(Date, Flow, isHighFlow )

>

> temp <- df %>%

> mutate(highFlowInterval = cumsum(isHighFlow==0)) %>%

> group_by(highFlowInterval) %>%

> summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate

> = max(as.character(Date)))

> #Then join the two tables together.

> temp2<-sqldf("SELECT *

> FROM temp LEFT JOIN df

> ON df.Date BETWEEN temp.minDate AND temp.maxDate")

>

