average at specific hour "endpoints" of the day

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

average at specific hour "endpoints" of the day

Massimo Bressan
hello

given my reproducible example

#---
date<-seq(ISOdate(2017,1, 1, 0), by="hour", length.out = 48)
v1<-1:48
df<-data.frame(date,v1)

#--

I need to calculate the average of variable v1 at specific hour "endpoints" of the day: i.e. at hours 6.00 and 22.00 respectively

the desired result is

date v1
01/01/17 22:00 15.5
02/01/17 06:00 27.5
02/01/17 22:00 39.5

at hour 06:00 of each day the average is calculated by considering the 8 previous records (hours from 23:00 to 6:00)
at hour 22:00 of each day the average is calculated by considering the 16 previous records (hours from 7:00 to 22:00)

any hint please?

I've been trying with some functions within the "xts" package but withouth much result...

thanks for the help



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: average at specific hour "endpoints" of the day

Jeff Newmiller
On Thu, 6 Apr 2017, Massimo Bressan wrote:

> hello
>
> given my reproducible example
>
> #---
> date<-seq(ISOdate(2017,1, 1, 0), by="hour", length.out = 48)
> v1<-1:48
> df<-data.frame(date,v1)
>
> #--

"date" and "df" are functions in base R... best to avoid hiding them by
re-using those names in the global environment

ISOdate forces GMT, which many data sets that you might work with do NOT
use. It is better to use ISOdatetime to avoid letting hidden code
determine the timezone that is applied to (or compared with) your data.

>
> I need to calculate the average of variable v1 at specific hour "endpoints" of the day: i.e. at hours 6.00 and 22.00 respectively
>
> the desired result is
>
> date v1
> 01/01/17 22:00 15.5
> 02/01/17 06:00 27.5
> 02/01/17 22:00 39.5
>
> at hour 06:00 of each day the average is calculated by considering the 8 previous records (hours from 23:00 to 6:00)
> at hour 22:00 of each day the average is calculated by considering the 16 previous records (hours from 7:00 to 22:00)
>
> any hint please?
>
> I've been trying with some functions within the "xts" package but withouth much result...

I am not sure how I would do this with xts, but the below code is one
fairly literal approach (implemented two ways) to translate your
requirements that is also potentially extensible if the data or
requirements change.

### Base R....

Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to
                                # the system to decide
dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 )
                                  , by="hour"
                                  , length.out = 48
                                  )
                  , v1 = 1:48
                  )
dta$nrec <- 1
dta$date <- as.POSIXct( trunc.POSIXt( dta$datetime, units="days" ) )
dta$tod <- as.numeric( dta$datetime - dta$date, units = "hours" )
dta$timeslot <- factor( ifelse( 6 < dta$tod & dta$tod <= 22
                               , "Day"
                               , "Night"
                               )
                       , levels = c( "Night", "Day" )
                       )
dta$slotdatetime <- dta$date + as.difftime( ifelse( "Day" == dta$timeslot
                                                   , 22
                                                   , ifelse( 22 < dta$tod
                                                           , 24+6
                                                           , 6
                                                           )
                                                   )
                                           , units="hours"
                                           )
dta2 <- aggregate( dta[ , c( "v1", "nrec" ) ]
                  , dta[ , c( "timeslot", "slotdatetime" ), drop=FALSE ]
                  , FUN = sum
                  )
dta2 <- subset( dta2, nrec == ifelse( "Day"==timeslot, 16, 8 ) )
dta2$v1mean <- dta2$v1 / dta2$nrec

#### or if you don't mind the tidyverse....

library(dplyr) # wonderland of non-standard evaluation... beware, Alice!
Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to
                                # the system to decide
dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 )
                                  , by="hour"
                                  , length.out = 48
                                  )
                  , v1 = 1:48
                  )
dta2 <- (   dta
         %>% mutate( date = as.POSIXct( trunc.POSIXt( datetime
                                                    , units="days"
                                                    )
                                      )
                   , tod = as.numeric( datetime - date, units = "hours" )
                   , timeslot = factor( ifelse( 6 < tod & tod <= 22
                                              , "Day"
                                              , "Night"
                                              )
                                      , levels = c( "Night", "Day" )
                                      )
                   , slotdatetime = date +
                            as.difftime( ifelse( "Day" == timeslot
                                               , 22
                                               , ifelse( 22 < tod
                                                       , 24+6
                                                       , 6
                                                       )
                                               )
                                       , units="hours"
                                       )
                   )
         %>% group_by( slotdatetime, timeslot )
         %>% summarise( v1mean = mean( v1 )
                      , nrec = n()
                      )
         %>% filter( nrec == ifelse( "Day"==timeslot, 16, 8 ) )
         )




> thanks for the help
> [[alternative HTML version deleted]]

This is a plain-text mailing list. Your chances of communicating
successfully when you post HTML format email are much worse than if you
post plain text using the "plain text" option in your mail program.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: average at specific hour "endpoints" of the day

Massimo Bressan
hi jeff

thank you for your code, there is lot to think about it...

In the meanwhile I've managed to work out a (sort of) solution but I'm still not completely satisfied with it

I would like to keep it all more elegant and possibly general

here it is, so far...

####

mydate<-seq(ISOdatetime(2017,1, 1, 0, 0, 0), by="hour", length.out = 48)
v1<-1:48
mydf<-data.frame(mydate,v1)

library(zoo)

z<-zoo(mydf[,-1], mydf[,1])

z8<-rollapply(z, width=8, FUN=mean, align="right")
iz8<-which(as.numeric(strftime(index(z8), '%H'))==6)
z8<-z8[iz8]

z16<-rollapply(z, width=16, FUN=mean, align="right")
iz16<-which(as.numeric(strftime(index(z16), '%H'))==22)
z16<-z16[iz16]

fortify.zoo(z16)
fortify.zoo(z8)

#and then any sort of manipulation with dataframes

####

bye

----- Messaggio originale -----
Da: "Jeff Newmiller" <[hidden email]>
A: "Massimo Bressan" <[hidden email]>
Cc: "r-help" <[hidden email]>
Inviato: Giovedì, 6 aprile 2017 18:19:29
Oggetto: Re: [R] average at specific hour "endpoints" of the day

On Thu, 6 Apr 2017, Massimo Bressan wrote:

> hello
>
> given my reproducible example
>
> #---
> date<-seq(ISOdate(2017,1, 1, 0), by="hour", length.out = 48)
> v1<-1:48
> df<-data.frame(date,v1)
>
> #--

"date" and "df" are functions in base R... best to avoid hiding them by
re-using those names in the global environment

ISOdate forces GMT, which many data sets that you might work with do NOT
use. It is better to use ISOdatetime to avoid letting hidden code
determine the timezone that is applied to (or compared with) your data.

>
> I need to calculate the average of variable v1 at specific hour "endpoints" of the day: i.e. at hours 6.00 and 22.00 respectively
>
> the desired result is
>
> date v1
> 01/01/17 22:00 15.5
> 02/01/17 06:00 27.5
> 02/01/17 22:00 39.5
>
> at hour 06:00 of each day the average is calculated by considering the 8 previous records (hours from 23:00 to 6:00)
> at hour 22:00 of each day the average is calculated by considering the 16 previous records (hours from 7:00 to 22:00)
>
> any hint please?
>
> I've been trying with some functions within the "xts" package but withouth much result...

I am not sure how I would do this with xts, but the below code is one
fairly literal approach (implemented two ways) to translate your
requirements that is also potentially extensible if the data or
requirements change.

### Base R....

Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to
                                # the system to decide
dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 )
                                  , by="hour"
                                  , length.out = 48
                                  )
                  , v1 = 1:48
                  )
dta$nrec <- 1
dta$date <- as.POSIXct( trunc.POSIXt( dta$datetime, units="days" ) )
dta$tod <- as.numeric( dta$datetime - dta$date, units = "hours" )
dta$timeslot <- factor( ifelse( 6 < dta$tod & dta$tod <= 22
                               , "Day"
                               , "Night"
                               )
                       , levels = c( "Night", "Day" )
                       )
dta$slotdatetime <- dta$date + as.difftime( ifelse( "Day" == dta$timeslot
                                                   , 22
                                                   , ifelse( 22 < dta$tod
                                                           , 24+6
                                                           , 6
                                                           )
                                                   )
                                           , units="hours"
                                           )
dta2 <- aggregate( dta[ , c( "v1", "nrec" ) ]
                  , dta[ , c( "timeslot", "slotdatetime" ), drop=FALSE ]
                  , FUN = sum
                  )
dta2 <- subset( dta2, nrec == ifelse( "Day"==timeslot, 16, 8 ) )
dta2$v1mean <- dta2$v1 / dta2$nrec

#### or if you don't mind the tidyverse....

library(dplyr) # wonderland of non-standard evaluation... beware, Alice!
Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to
                                # the system to decide
dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 )
                                  , by="hour"
                                  , length.out = 48
                                  )
                  , v1 = 1:48
                  )
dta2 <- (   dta
         %>% mutate( date = as.POSIXct( trunc.POSIXt( datetime
                                                    , units="days"
                                                    )
                                      )
                   , tod = as.numeric( datetime - date, units = "hours" )
                   , timeslot = factor( ifelse( 6 < tod & tod <= 22
                                              , "Day"
                                              , "Night"
                                              )
                                      , levels = c( "Night", "Day" )
                                      )
                   , slotdatetime = date +
                            as.difftime( ifelse( "Day" == timeslot
                                               , 22
                                               , ifelse( 22 < tod
                                                       , 24+6
                                                       , 6
                                                       )
                                               )
                                       , units="hours"
                                       )
                   )
         %>% group_by( slotdatetime, timeslot )
         %>% summarise( v1mean = mean( v1 )
                      , nrec = n()
                      )
         %>% filter( nrec == ifelse( "Day"==timeslot, 16, 8 ) )
         )




> thanks for the help
> [[alternative HTML version deleted]]

This is a plain-text mailing list. Your chances of communicating
successfully when you post HTML format email are much worse than if you
post plain text using the "plain text" option in your mail program.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.