10 Minute Time Window Aggregate By Multiple Grouping Variables

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

10 Minute Time Window Aggregate By Multiple Grouping Variables

Rahul Sharm
Hello All,
I have 1M rows of time stamped information about delivery trucks and their
trip related information from simulation .Detailed info the column names
and attributes for clarity

   1. id: alphanumeric factor/character
   2. datetime.of.trip.start: POSIXct yyyy-mm-dd hh:mm:ss
   3. datetime.of.trip.end: POSIXct yyyy-mm-dd hh:mm:ss
   4. trip.distance: numeric, miles
   5. trip.fuel.consumed: numeric, gallons

Close to 2500 trucks , simulated for a year with varying trip behavior.
This is what I am trying to accomplish 1. First, I want to create 10 minute
intervals from a chosen start date and end date from simulation, ex:
2011-03-30 00:00:00 to 2012-04-01 00:00:00 { approximately 8760 hours *
six, 10 minute blocks per hour = looking at 52K unique timestamps }. If it
works, will look into 15 minute, hourly and so on. * This will be
representative of a start.time.index * I want to recreate the same time
window with same frequency but for a column named end.time.index with time
increment of 10 minutes 2. Go to the raw data, inspect the "
datetime.of.trip.start " and "datetime.of.trip.end", get the time span,
match it with the derived 10 min time time intervals (start and end),
equally distribute the columns 4,5 (the numeric variables) among the 10
minute indices.
Single row example

   - datetime.of.trip.start: 2017-01-01 00:00:00
   - datetime.of.end.start: 2017-01-01 01:00:00
   - trip.distance = 60 miles
   - trip.fuel.consumed = 6 gallons

Interpretation of raw data, on January 1, 2017 from 00 am/midnight to 1 am,
the delivery truck traveled 60 miles using 6 gallons of fuel 3. I want to
be able to do this over entire raw data, get the aggregate sum of the
variables of interest (distance and fuel for example) by the 10.min start
and end time intervals across all ids as long as there is the 10min
start/end time overlap derived from the raw data 4. Repeat the same with
some grouping criteria from DateTime or suitable index, example day of week
or by truckid. From index 1 (00:00:00 to  00:10:00 ) to index 6(00:00:50 to
01:00:00), 60 miles and 6 gallons are equally distributed.

So far, I have created the 10 min HH:MM ids and exploring the foverlaps in
data.table, period.apply from xts, converting the time indices into
numerics and explore ddply %>% mutate %>% group by options. I am trying to
get a more streamlined version that would work as a time series object
which would help me get the descriptive statistics (aggregate by grouping
criteria for sums and means) and do some plotting and time series analysis
(RMA, smoothing, mixed regression models). I got stuck and totally lost
when working with zoo and xts, I was creating the index and working on
period.apply but was not sure how to work around with 2 POSIXct column
variables (start and end). I have the desired index and the cleaned data, I
am looking for a more elegant data.table solution to complete the 10min


 # simple illustration of the daa


start.time.index=as.POSIXct (c("2017-01-01 00:00:00","2017-01-01
00:10:00","2017-01-01 00:20:00","2017-01-01 00:30:00","2017-01-01
00:40:00","2017-01-01 00:50:00"), format="%Y-%m-%d %H:%M:%S")

end.time.index=as.POSIXct (c("2017-01-01 00:10:00","2017-01-01
00:20:00","2017-01-01 00:30:00","2017-01-01 00:40:00","2017-01-01
00:50:00","2017-01-01 01:00:00"), format="%Y-%m-%d %H:%M:%S")

trip.fuel=c(1,1,1,1,1,1) # total of 6 gallons in 1 hour equally
divided into 6 bins of 10 minute window.


        [[alternative HTML version deleted]]

[hidden email] mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.