Test if date occurs in multiple date ranges with R
Asked Answered
I

3

5

I have a data frame with multiple date ranges (45 to be exact):

Range  Start       End
1      2014-01-01  2014-02-30
2      2015-01-10  2015-03-30
3      2016-04-20  2016-10-12
...    ...         ...

They will never overlap

I also have a data frame with various event dates (200K+):

Event  Date
1      2014-01-02
2      2014-03-20
3      2015-04-01
4      2016-08-18
...    ...

I want to test if these dates fall within any of these ranges:

Event  Date        InRange
1      2014-01-02  TRUE
2      2014-03-20  FALSE
3      2015-04-01  FALSE
4      2016-08-18  TRUE
...

What is the best way to perform this test? I have looked at lubridate's between and interval functions as well as various Stackoverflow questions, but cannot find a good solution.

Indeterminism answered 21/6, 2016 at 0:21 Comment(0)
C
6

You can create a vector of your date range from the first data frame, then use %in% operator to check if each date of your events is in this date range. Assuming your first data frame is dateRange, and second events, putting the above logic in one line would be:

events$InRange <- events$Date %in% unlist(Map(`:`, dateRange$Start, dateRange$End))

events
  Event       Date InRange
1     1 2014-01-02    TRUE
2     2 2014-03-20   FALSE
3     3 2015-04-01   FALSE
4     4 2016-08-18    TRUE

Where we used the Map to create the date range vector. Map combined with : operator create a list of date range from the Start to the End. Somewhere close to list(2014-01-01 : 2014-02-30, 2015-01-10 : 2015-03-30, 2016-04-20 : 2016-10-12 ...)(symbolically, not valid), with the unlist, we flatten it as a vector of date range which could then be used with %in% conveniently.

Convector answered 21/6, 2016 at 0:59 Comment(2)
Awesome one liner - Works a charm!!! Can you explain the unlist and Map part please?Indeterminism
Just updated the explanation. If it is still not clear, check the output of Map(:, c(1,2), c(3,4)) and you will see how it works.Convector
C
4

Having ordered, non-overlapping intervals in your first "data.frame", you could test -for each event date- if it is above a $Start and its respective $End. Using findInterval to reduce relational comparisons and memory needed.

findInterval(events$Date, ranges$Start) > findInterval(events$Date, ranges$End)
#[1]  TRUE FALSE FALSE  TRUE

With data (modified "2014-02-30"):

ranges = structure(list(Range = 1:3, Start = structure(c(16071, 16445, 
16911), class = "Date"), End = structure(c(16129, 16524, 17086
), class = "Date")), .Names = c("Range", "Start", "End"), row.names = c(NA, 
-3L), class = "data.frame")

events = structure(list(Event = 1:4, Date = structure(c(16072, 16149, 
16526, 17031), class = "Date")), .Names = c("Event", "Date"), row.names = c(NA, 
-4L), class = "data.frame")
Conde answered 21/6, 2016 at 9:28 Comment(0)
P
1

Write your own function to check if a list of dates are in any of a number of intervals.

date.in <- function(x){
m <- NULL
for (i in 1:NROW(df)){m <- c(m,  ifelse(x>=df[i,1] & x<=df[i,2], TRUE, FALSE))}
any(m)}

Data:

df <- data.frame(start=c("2014-01-01", "2015-01-10", "2016-04-20"), 
       end=c("2014-02-30", "2015-03-30", "2016-10-12"))
df[] <- lapply(df, as.character)

s <- c("2014-01-02", "2014-03-20", "2015-04-01", "2016-08-18")

Test using string s.

as.character(lapply(s, date.in))#TRUE FALSE FALSE TRUE
Palladino answered 21/6, 2016 at 0:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.