How to finding a given length of runs in a series of data?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to finding a given length of runs in a series of data?

jcrosbie
I'm trying to study times in which flow was operating at a given level or greater. To do so I have created a way to see how long the series has operated at a high level. But for some reason the data is calculating the runs one hour to long. Any ideas on why?





Code:
Date<-format(seq(as.POSIXct("2014-01-01 01:00"), as.POSIXct("2015-01-01 00:00"),     by="hour"), "%Y-%m-%d %H:%M", usetz = FALSE)
Flow<-runif(8760, 0, 2300)

IsHigh<- function(x ){
    if (x < 1600) return(0)
    if (1600 <= x) return(1)
}

isHighFlow = unlist(lapply(Flow, IsHigh))

df = data.frame(Date, Flow, isHighFlow )


temp <- df %>%
  mutate(highFlowInterval = cumsum(isHighFlow==0)) %>%
  group_by(highFlowInterval) %>%
  summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate = max(as.character(Date)))

#Then join the two tables together.
temp2<-sqldf("SELECT *
  FROM temp LEFT JOIN df
  ON df.Date BETWEEN temp.minDate AND temp.maxDate")
Reply | Threaded
Open this post in threaded view
|

Re: How to finding a given length of runs in a series of data?

Adams, Jean
Two libraries are needed to run the code you submitted ...

library(dplyr)
library(sqldf)

Your IsHigh() function and its use can be replaced by a single line of code

isHighFlow <- as.numeric(Flow>=1600)

You are getting the additional hour by using cumsum().  One date element
which you seem to characterize as zero hours returns a one in cumsum, two
returns two, etc.
cumsum(c(1, 0, 1, 1, 0, 1, 1, 1, 0))

If everything is off by one hour, just subtract a 1.  Problem solved.

Jean


On Wed, May 6, 2015 at 5:55 PM, jcrosbie <[hidden email]> wrote:

> I'm trying to study times in which flow was operating at a given level or
> greater. To do so I have created a way to see how long the series has
> operated at a high level. But for some reason the data is calculating the
> runs one hour to long. Any ideas on why?
>
>
>
>
>
> Code:
> Date<-format(seq(as.POSIXct("2014-01-01 01:00"), as.POSIXct("2015-01-01
> 00:00"),     by="hour"), "%Y-%m-%d %H:%M", usetz = FALSE)
> Flow<-runif(8760, 0, 2300)
>
> IsHigh<- function(x ){
>     if (x < 1600) return(0)
>     if (1600 <= x) return(1)
> }
>
> isHighFlow = unlist(lapply(Flow, IsHigh))
>
> df = data.frame(Date, Flow, isHighFlow )
>
>
> temp <- df %>%
>   mutate(highFlowInterval = cumsum(isHighFlow==0)) %>%
>   group_by(highFlowInterval) %>%
>   summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate
> = max(as.character(Date)))
>
> #Then join the two tables together.
> temp2<-sqldf("SELECT *
>   FROM temp LEFT JOIN df
>   ON df.Date BETWEEN temp.minDate AND temp.maxDate")
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/How-to-finding-a-given-length-of-runs-in-a-series-of-data-tp4706915.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.