Comparing dates in two large data frames

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Comparing dates in two large data frames

Kulupp
Dear all,

I have two data frames (df1 and df2) and for each timepoint in df1 I
want to know: is it whithin any of the timespans in df2? The result
(e.g. "no" or "yes" or 0 and 1) should be shown in a new column of df1

Here is the code to create the two data frames (the size of the two data
frames is approx. the same as in my original data frames):

# create data frame df1
ti1 <- seq.POSIXt(from=as.POSIXct("2020/01/01", tz="UTC"),
to=as.POSIXct("2020/06/01", tz="UTC"), by="10 min")
df1 <- data.frame(Time=ti1)

# create data frame df2 with random timespans, i.e. start and end dates
start <- sort(sample(seq(as.POSIXct("2020/01/01", tz="UTC"),
as.POSIXct("2020/06/01", tz="UTC"), by="1 mins"), 5000))
end   <- start + 120
df2 <- data.frame(start=start, end=end)

Everything I tried (ifelse combined with sapply or for loops) has been
very very very slow. Thus, I am looking for a reasonably fast solution.

Thanks a lot for any hint in advance !

Cheers,

Thomas

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Comparing dates in two large data frames

Rui Barradas
Hello,

The following solution seems to work and is fast, like findInterval is.
It first determines where in df2$start is each value of df1$Time. Then
uses that index to see if those Times are not greater than the
corresponding df$end.
I checked against a small subset of df1 and the results were right.


result <- logical(nrow(df1))
inx <- findInterval(df1$Time, df2$start)
not_zero <- inx != 0
result[not_zero] <- df1$Time[not_zero] <= df2$end[ inx[not_zero] ]


Hope this helps,

Rui Barradas


Às 12:06 de 10/04/21, Kulupp escreveu:

> Dear all,
>
> I have two data frames (df1 and df2) and for each timepoint in df1 I
> want to know: is it whithin any of the timespans in df2? The result
> (e.g. "no" or "yes" or 0 and 1) should be shown in a new column of df1
>
> Here is the code to create the two data frames (the size of the two data
> frames is approx. the same as in my original data frames):
>
> # create data frame df1
> ti1 <- seq.POSIXt(from=as.POSIXct("2020/01/01", tz="UTC"),
> to=as.POSIXct("2020/06/01", tz="UTC"), by="10 min")
> df1 <- data.frame(Time=ti1)
>
> # create data frame df2 with random timespans, i.e. start and end dates
> start <- sort(sample(seq(as.POSIXct("2020/01/01", tz="UTC"),
> as.POSIXct("2020/06/01", tz="UTC"), by="1 mins"), 5000))
> end   <- start + 120
> df2 <- data.frame(start=start, end=end)
>
> Everything I tried (ifelse combined with sapply or for loops) has been
> very very very slow. Thus, I am looking for a reasonably fast solution.
>
> Thanks a lot for any hint in advance !
>
> Cheers,
>
> Thomas
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.