My Problem
Consider a set of data with two intervals. For instance, consider a student schedule of classes. Each record has a begin and end date, and each class has a period start time and a period end time. But this schedule is not 'normalized' in the sense that some records overlap. So if you search for records encompassing a given date and period for a student, you might get multiple matches.
Here's a contrived example. I represent the dates as integers to simplify the problem:
declare @schedule table (
student char(3),
fromDate int,
toDate int,
fromPeriod int,
toPeriod int
)
insert @schedule values
('amy', 1, 7, 7, 9),
('amy', 3, 9, 5, 8),
('amy', 10, 12, 1, 3),
('ted', 1, 5, 11, 14),
('ted', 7, 11, 13, 16);
Amy's date and period ranges either overlap or are adjacent. If I queried for 'date 5 period 7', I would get two matches. I need these reworked so that they represent the same 'area' but no longer overlap.
Ted's periods overlap but his dates do not. This means there's no real overlap, so no need to re-work anything.
My Research
I've read many posts and some articles on working overlapping intervals. Namely:
- Merge overlapping date intervals
- Flattening intersecting timespans
- Condense Time Periods with SQL
- SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value
- Flatten/merge overlapping time intervals
- SQL Server separate overlapping dates
I've implemented one from Itzik from a blog entitled 'solutions-packing-date-and-time-intervals-puzzle' that has worked great for one particular project. I don't think it's a stable link, but I've found a copy of it here.
But I'm having difficulty extending the knowledge in those resources to my problem at hand. It might be my limitation. I have trouble following them. I've studied Itzik's solution and have come to understand a lot of it, but I do recall there's one piece I just couldn't understand. Or it might be that those solutions only work with singular ranges.
My Attempt
I resolved this question by treating the ranges as literal rectangle objects. It works. I've even made a version of it somewhat performant in my own application. So I'll post it as a solution in case it is of use to anyone with the same issue.
But it is so long and involved and there are enough quirks to it (e.g. buffering lines, looping shapes, working with float values, rounding issues) that I can't help but think that there's a much better way. Can the concepts of my listed resources be extended to dual ranges? Or do some SRID's allow cutting rectangles with zero-length lines?
Expected Results:
There is no one answer to this problem, because you can aggregate ranges and deconstruct them in different ways. But to minimize the number of resulting rectangles, there are really only two acceptable answers. Visually, with dates on the X axis and periods on the Y axis, overlapping ranges can start out like this:
+------------+
| |
| +------------+
| |||||||| | <- 2 overlapping rectangles
+----| |
| |
+------------+
We can rework it this way:
+---+ +-----+
| | | |
| | | | +---+ <- 3 non-overlapping
| | | | | | vertically cut rectangles
+---| | | | |
| | | |
+-----+ +---+
Or this way:
+-----------+
+-----------+
+-----------------+ <- 3 non-overlapping
+-----------------+ horizontally cut rectangles
+-----------+
+-----------+
Going with vertical cuts, the results would look like this:
+-------------------------------------------+
|student|fromDate|toDate|fromPeriod|toPeriod|
|-------------------------------------------|
|amy |1 |2 |7 |9 |
|amy |3 |7 |5 |9 |
|amy |8 |9 |5 |8 |
|amy |10 |12 |1 |3 |
|ted |1 |5 |11 |14 |
|ted |7 |11 |13 |16 |
+-------------------------------------------+
Going with horizontal cuts, the results would look like this:
+-------------------------------------------+
|student|fromDate|toDate|fromPeriod|toPeriod|
|-------------------------------------------|
|amy |1 |7 |9 |9 |
|amy |1 |9 |7 |8 |
|amy |3 |9 |5 |6 |
|amy |10 |12 |1 |3 |
|ted |1 |5 |11 |14 |
|ted |7 |11 |13 |16 |
+-------------------------------------------+
Either is acceptable. Though, to keep it deterministic and tractable, you would want to choose one strategy and stick with it.