I’ve got some transit data relating to bus stops for a GIS data set. Each row represents one stop on a route. For each record I have the start time of the route, a sequence in which a bus stops, the time the bus arrives at the first stop and the time taken to get to each of the stops from the last one in the sequence. Not all sequences of stops starts with the number 1, some may start with a higher number.
I need to make a new variable which has the time the bus arrives at each stop by using the start time from the stop with the lowest sequence number, to populate all of the arrival times for each stop in each route. I have a very simple example below with just three routes and a few stops in each. My actual data set has a few million rows. I've also created a version of the data set I'm aiming to get. There are two problems here. Firstly getting the data into the correct format to do the calculations with durations, and secondly running a function over the data set to obtain the times. It is the durations that are critical not the date, so using the POSIX methods doesn’t really seem appropriate here. Ultimately the times are going to be used in a route solver in an ArcSDE geodatabase. I tried to use strptime to format my times, but could not get them into a data.frame as presumably they are a list. In this example I’ve left them as strings. Any help is much appreciated. #create four columns with route id, stop sequence interval time and route start time ssq<-c(3,4,5,6,7,8,9,1,2,3,4,2,3,4,5,6,7,8) tint<-c("00:00","00:12","00:03","00:06","00:09","00:02","00:04","00:00","00:08","00:10","00:10","00:00","00:02","00:04","00:08","00:02","00:01","00:01") tst<-c(rep("18:20",7),rep("10:50",4),rep("16:15",7)) rtid<-c(rep("a",7),rep("b",4),rep("c",7)) df<-data.frame(cbind(ssq,tint,tst,rtid)) df #correct data set should look like this tarr<-c("18:20","18:32","18:35","18:41","18:50","18:52","18:56","10:50","10:58","11:08","11:18","16:15","16:17","16:21","16:29","16:31","16:32","16:33") df2<-cbind(df,tarr) df2 |
Hi gavinr,
Perhaps this will do what you want. add_HH_MM<-function(x) { t1bits<-strsplit(as.character(x$tst),":") t2bits<-strsplit(as.character(x$tint),":") hours<-as.numeric(lapply(t1bits,"[",1))+cumsum(as.numeric(lapply(t2bits,"[",1))) minutes<-as.numeric(lapply(t1bits,"[",2))+cumsum(as.numeric(lapply(t2bits,"[",2))) next_hour<-minutes > 59 # adjust for running into the next hour minutes[next_hour]<-minutes[next_hour]-60 hours[next_hour]<-hours[next_hour]+1 # adjust for running into the next day hours[hours > 23]<-hours[hours > 23]-24 return(paste(formatC(hours,width=2,flag=0),formatC(minutes,width=2,flag=0),sep=":")) } df$tarr<-unlist(by(df,df$rtid,add_HH_MM)) Jim On Tue, May 26, 2015 at 5:28 AM, gavinr <[hidden email]> wrote: > I’ve got some transit data relating to bus stops for a GIS data set. Each > row represents one stop on a route. For each record I have the start time > of the route, a sequence in which a bus stops, the time the bus arrives at > the first stop and the time taken to get to each of the stops from the last > one in the sequence. Not all sequences of stops starts with the number 1, > some may start with a higher number. > I need to make a new variable which has the time the bus arrives at each > stop by using the start time from the stop with the lowest sequence number, > to populate all of the arrival times for each stop in each route. > > I have a very simple example below with just three routes and a few stops in > each. My actual data set has a few million rows. I've also created a > version of the data set I'm aiming to get. > > There are two problems here. Firstly getting the data into the correct > format to do the calculations with > durations, and secondly running a function over the data set to obtain the > times. > It is the durations that are critical not the date, so using the POSIX > methods doesn’t really seem appropriate here. Ultimately the times are > going to be used in a route solver in an ArcSDE geodatabase. I tried to use > strptime to format my times, but could not get them into a data.frame as > presumably they are a list. In this example I’ve left them as strings. > > Any help is much appreciated. > > #create four columns with route id, stop sequence interval time and route > start time > ssq<-c(3,4,5,6,7,8,9,1,2,3,4,2,3,4,5,6,7,8) > tint<-c("00:00","00:12","00:03","00:06","00:09","00:02","00:04","00:00","00:08","00:10","00:10","00:00","00:02","00:04","00:08","00:02","00:01","00:01") > tst<-c(rep("18:20",7),rep("10:50",4),rep("16:15",7)) > rtid<-c(rep("a",7),rep("b",4),rep("c",7)) > df<-data.frame(cbind(ssq,tint,tst,rtid)) > df > > #correct data set should look like this > tarr<-c("18:20","18:32","18:35","18:41","18:50","18:52","18:56","10:50","10:58","11:08","11:18","16:15","16:17","16:21","16:29","16:31","16:32","16:33") > df2<-cbind(df,tarr) > df2 > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/run-a-calculation-function-over-time-fields-ordered-and-grouped-by-variables-tp4707655.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Another way:
#create four columns with route id, stop sequence interval time and route start time ssq <- c( 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 2, 3, 4, 5, 6, 7, 8 ) tint <- c( "00:00", "00:12", "00:03", "00:06", "00:09", "00:02", "00:04" , "00:00", "00:08", "00:10", "00:10" , "00:00", "00:02", "00:04", "00:08", "00:02", "00:01", "00:01" ) tst <- c( rep( "18:20", 7 ) , rep( "10:50", 4 ) , rep( "16:15", 7 ) ) rtid <- c( rep( "a", 7 ) , rep( "b", 4 ) , rep( "c", 7 ) ) # Don't use cbind to make data frames... it usually ends up # forcing all columns to be character or factors # Also, avoid using "df" as a variable name... it is the name of # a function in base R, so that gets confusing fast DF <- data.frame( ssq, tint, tst, rtid, stringsAsFactors=FALSE ) DF #correct data set should look like this tarr <- c( "18:20", "18:32", "18:35", "18:41", "18:50", "18:52", "18:56" , "10:50", "10:58", "11:08", "11:18" , "16:15", "16:17", "16:21", "16:29", "16:31", "16:32", "16:33" ) DF2 <- data.frame( DF, tarr, stringsAsFactors=FALSE ) DF2 library(dplyr) DFs <- ( DF %>% group_by( rtid ) %>% mutate( tarr = as.character( as.POSIXct( tst, format="%H:%M", tz="GMT" ) + as.difftime( cumsum( as.numeric( as.difftime( tint, format="%H:%M" ) , units="mins" ) ) , units="mins" ) , format="%H:%M" ) ) %>% as.data.frame # removes grouping behavior from result ) identical( DFs, DF2 ) On 2015-05-25 15:43, Jim Lemon wrote: > Hi gavinr, > Perhaps this will do what you want. > > add_HH_MM<-function(x) { > t1bits<-strsplit(as.character(x$tst),":") > t2bits<-strsplit(as.character(x$tint),":") > > hours<-as.numeric(lapply(t1bits,"[",1))+cumsum(as.numeric(lapply(t2bits,"[",1))) > > minutes<-as.numeric(lapply(t1bits,"[",2))+cumsum(as.numeric(lapply(t2bits,"[",2))) > next_hour<-minutes > 59 > # adjust for running into the next hour > minutes[next_hour]<-minutes[next_hour]-60 > hours[next_hour]<-hours[next_hour]+1 > # adjust for running into the next day > hours[hours > 23]<-hours[hours > 23]-24 > > return(paste(formatC(hours,width=2,flag=0),formatC(minutes,width=2,flag=0),sep=":")) > } > > df$tarr<-unlist(by(df,df$rtid,add_HH_MM)) > > Jim > > > On Tue, May 26, 2015 at 5:28 AM, gavinr <[hidden email]> wrote: >> I’ve got some transit data relating to bus stops for a GIS data set. >> Each >> row represents one stop on a route. For each record I have the start >> time >> of the route, a sequence in which a bus stops, the time the bus >> arrives at >> the first stop and the time taken to get to each of the stops from the >> last >> one in the sequence. Not all sequences of stops starts with the >> number 1, >> some may start with a higher number. >> I need to make a new variable which has the time the bus arrives at >> each >> stop by using the start time from the stop with the lowest sequence >> number, >> to populate all of the arrival times for each stop in each route. >> >> I have a very simple example below with just three routes and a few >> stops in >> each. My actual data set has a few million rows. I've also created a >> version of the data set I'm aiming to get. >> >> There are two problems here. Firstly getting the data into the >> correct >> format to do the calculations with >> durations, and secondly running a function over the data set to obtain >> the >> times. >> It is the durations that are critical not the date, so using the POSIX >> methods doesn’t really seem appropriate here. Ultimately the times >> are >> going to be used in a route solver in an ArcSDE geodatabase. I tried >> to use >> strptime to format my times, but could not get them into a data.frame >> as >> presumably they are a list. In this example I’ve left them as >> strings. >> >> Any help is much appreciated. >> >> #create four columns with route id, stop sequence interval time and >> route >> start time >> ssq<-c(3,4,5,6,7,8,9,1,2,3,4,2,3,4,5,6,7,8) >> tint<-c("00:00","00:12","00:03","00:06","00:09","00:02","00:04","00:00","00:08","00:10","00:10","00:00","00:02","00:04","00:08","00:02","00:01","00:01") >> tst<-c(rep("18:20",7),rep("10:50",4),rep("16:15",7)) >> rtid<-c(rep("a",7),rep("b",4),rep("c",7)) >> df<-data.frame(cbind(ssq,tint,tst,rtid)) >> df >> >> #correct data set should look like this >> tarr<-c("18:20","18:32","18:35","18:41","18:50","18:52","18:56","10:50","10:58","11:08","11:18","16:15","16:17","16:21","16:29","16:31","16:32","16:33") >> df2<-cbind(df,tarr) >> df2 >> >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/run-a-calculation-function-over-time-fields-ordered-and-grouped-by-variables-tp4707655.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |