Determine if a range is completely covered by a set of ranges
Asked Answered
M

4

2

How can I check if a range is completely covered by a set of ranges. In the following example:

WITH ranges(id, a, b) AS (
    SELECT 1,  0, 40 UNION
    SELECT 2, 40, 60 UNION
    SELECT 3, 80, 100 UNION
    SELECT 4, 10, 30
), tests(id, a, b) AS (
    SELECT 1, 10, 90 UNION
    SELECT 2, 10, 60
)
SELECT *
FROM tests
WHERE -- ?
  • I want to select 10, 60 because all of it is covered by 0, 40 and 40, 60 (and 10, 30)
  • I want to exclude 10, 90 because it is exposed between 60, 80

Assume that a is inclusive and b is exclusive i.e. the value 40 belongs to [40, 60) and not [0, 40). The ranges can contain gaps and all kind of overlaps.

The actual problem involves date+time data but dates are just numbers. I am using SQL server but generic solution is preferred.

Moralist answered 18/9, 2018 at 10:0 Comment(1)
It looks like you're using closed intervals here. Be aware that, especially for datetime data, a semi-closed interval should be preferred. It's easier to compute the endpoints, you're less likely to skip data that should be included, etc.Lizettelizotte
B
1

You want a recursive query finding the real ranges (0 to 60 and 80 to 100 in your case). We'd start with the ranges given and look for ranges extending these. At last we stick with the most extended ranges (e.g. the range 10 to 30 can be extended to 0 to 40 and then to 0 to 60, so we keep the widest range 0 to 60).

with wider_ranges(a, b, grp) as
(
  select a, b, id from ranges
  union all
  select
    case when r1.a < r2.a then r1.a else r2.a end,
    case when r1.b > r2.b then r1.b else r2.b end,
    r1.grp
  from wider_ranges r1
  join ranges r2 on (r2.a < r1.a and r2.b >= r1.a)
                 or (r2.b > r1.b and r2.a <= r1.b)
)
, real_ranges(a, b) as
(
  select distinct min(a), max(b)
  from wider_ranges
  group by grp
)
select * 
from tests
where exists
(
  select *
  from real_ranges
  where tests.a >= real_ranges.a and tests.b <= real_ranges.b
);

Rextester demo: http://rextester.com/BDJA16583

As requested this works in SQL Server, but is standard SQL, so it should work in about every DBMS featuring recursive queries.

Beiderbecke answered 18/9, 2018 at 10:27 Comment(0)
F
2

This is a recursive solution similar to Thorsten's. Just providing another example to have.

WITH ranges(id, a, b) AS (
    SELECT 1,  0, 40 UNION
    SELECT 2, 40, 60 UNION
    SELECT 3, 80, 100 UNION
    SELECT 4, 10, 30 
), tests(id, a, b) AS
(   
        SELECT 1 as id, 10 as a, 90 as b
        UNION
        SELECT 2, 10, 60
), rangeFinder(a, b, ra, rfb) AS
(
    SELECT a, b, 0 AS ra, 0 AS rfb 
    FROM ranges AS r
    UNION ALL
    SELECT rangeFinder.a, ranges.b, ranges.a, rangeFinder.b 
    FROM ranges 
    JOIN rangeFinder
        ON ranges.b > rangeFinder.b
        AND ranges.a <=rangeFinder.b
), islands(a, b) AS
(
    SELECT a, b 
    FROM rangeFinder
    WHERE a NOT IN (SELECT ra FROM rangeFinder)
        AND b NOT IN (SELECT rfb FROM rangeFinder)
)
SELECT t.id, t.a, t.b FROM 
tests t
JOIN islands i
ON t.a >= i.a
AND t.b <= i.b

Demo here: http://rextester.com/HDQ52126

Filippo answered 18/9, 2018 at 11:35 Comment(3)
I like the names you are using for your queries. rangeFinder is a great name for the recursive cte.Beiderbecke
I had actually used realRanges first, but decided that following it with islands seemed conflicting.Filippo
Good thinking. It's often the names we choose that make our queries more or less readable.Beiderbecke
B
1

You want a recursive query finding the real ranges (0 to 60 and 80 to 100 in your case). We'd start with the ranges given and look for ranges extending these. At last we stick with the most extended ranges (e.g. the range 10 to 30 can be extended to 0 to 40 and then to 0 to 60, so we keep the widest range 0 to 60).

with wider_ranges(a, b, grp) as
(
  select a, b, id from ranges
  union all
  select
    case when r1.a < r2.a then r1.a else r2.a end,
    case when r1.b > r2.b then r1.b else r2.b end,
    r1.grp
  from wider_ranges r1
  join ranges r2 on (r2.a < r1.a and r2.b >= r1.a)
                 or (r2.b > r1.b and r2.a <= r1.b)
)
, real_ranges(a, b) as
(
  select distinct min(a), max(b)
  from wider_ranges
  group by grp
)
select * 
from tests
where exists
(
  select *
  from real_ranges
  where tests.a >= real_ranges.a and tests.b <= real_ranges.b
);

Rextester demo: http://rextester.com/BDJA16583

As requested this works in SQL Server, but is standard SQL, so it should work in about every DBMS featuring recursive queries.

Beiderbecke answered 18/9, 2018 at 10:27 Comment(0)
M
0

As described in the accepted answer, the solution is to merge overlapping ranges together, then determine if test range is present inside one of the merged ranges.

Besides join and/or recursion, you can use the sorting approach with window functions to merge overlapping ranges:

WITH ranges(id, a, b) AS (
    SELECT 1,  0, 40 UNION
    SELECT 2, 40, 60 UNION
    SELECT 3, 80, 100 UNION
    SELECT 4, 10, 30
), tests(id, a, b) AS (
    SELECT 1, 10, 90 UNION
    SELECT 2, 10, 60
), ranges_chg AS (
    SELECT *, CASE WHEN MAX(b) OVER (ORDER BY a ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) >= a THEN 0 ELSE 1 END AS chg
    FROM ranges
), ranges_grp AS(
    SELECT *, SUM(chg) OVER (ORDER BY a) AS grp
    FROM ranges_chg
), merged_ranges AS (
    SELECT MIN(a) AS a, MAX(b) AS b
    FROM ranges_grp
    GROUP BY grp
)
SELECT *
FROM tests
WHERE EXISTS (
    SELECT 1
    FROM merged_ranges
    WHERE merged_ranges.a <= tests.a AND tests.b <= merged_ranges.b
)

Result and Fiddle.

| id | a  | b  |
|----|----|----|
| 2  | 10 | 60 |

The data inside range_grp CTE will give you an idea of how it works:

| id | a  | b   | max b over... | chg | grp |
|----|----|-----|---------------|-----|-----|
| 1  | 0  | 40  | NULL          | 1   | 1   |
| 4  | 10 | 30  | 40            | 0   | 1   |
| 2  | 40 | 60  | 40            | 0   | 1   |
| 3  | 80 | 100 | 60            | 1   | 2   |
Moralist answered 18/9, 2018 at 10:0 Comment(0)
E
0

This is a general form of the solution. The idea is to do the following:

  • Get a list of all points that could not be in a range. This is all range starts and range ends.
  • Check if any of them are not in a range.

This is based on the operation that a point not in a range will be one of the numbers included in either table:

with tc as (
      select t.test, r.candidate
      from tests t join
           (select r.a as candidate from ranges union all
            select r.b from ranges
           ) r
           on r.candiate >= t.a and r.candidate < t.b
      union all
      select t.test, t.a
      from tests t
      union all
      select t.test, t.b
      from tests t
     )
select distinct tc.test
from tc
where not exists (select 1
                  from ranges r
                  where tc.candidate >= r.a and tc.candidate < r.b
                 );

Because ranges include the first item, you don't actually have to check that. So the list of candidates can be reduced:

with tc as (
      select t.test, r.candidate
      from tests t join
           (select r.b as candidate from ranges
           ) r
           on r.candidate >= t.a and r.r < t.b
      union all
      select t.test, t.a
      from tests t
      union all
      select t.test, t.b
      from tests t
     )
Efferent answered 18/9, 2018 at 10:26 Comment(4)
I tried very hard to fix your query work but I couldn't.Moralist
@SalmanA . . . and what issue do you have? A SQL Fiddle (or related site would help).Efferent
The column/table names in the query are all wrong (e.g. r.r). I tried all combinations of a and b but at best the query returns all rows.Moralist
@SalmanA . . . That column should be called candidate. I shouldn't change the name of things in the middle of writing queries.Efferent

© 2022 - 2024 — McMap. All rights reserved.