Group close numbers

Asked 31/8, 2011 at 14:27 Answered 1/9, 2011 at 17:9

Solved sql sql-server sql-server-2008 gaps-and-islands

I have a table with 2 columns of integers. The first column represents start index and the second column represents end index.

Simple So far. What I would like to do is group all the records that follow together:

START END
1     25
30    49
60    67

A record can follow by Starting on the same index as the previous end index or by a margin of 1:

START END
1     10
10    20

And

START END
1     10
11    20

will both result in

START END
1     20

I'm using SQL Server 2008 R2.

Any help would be Great

Vimineous answered 31/8, 2011 at 14:27 Comment(3)

I think this is an interesting question, but have you actually made any attempts to do this yourself yet? Queries you've tried? – Chibcha 31/8, 2011 at 14:30

Might you have any overlapping pairs such as 1,8 AND 3,15? – Strake 31/8, 2011 at 14:45

Tx for your comment Martin.. There is no overlapping pairs. Jadarnel27 - I solved this issue using sql cursor but this solution isn't efficient at all and I'm looking for more elegant and better solution. – Vimineous 1/9, 2011 at 7:30

This works for your example, let me know if it doesn't work for other data

create table #Range 
(
  [Start] INT,
  [End] INT
)

insert into #Range ([Start], [End]) Values (1, 8)
insert into #Range ([Start], [End]) Values (9, 13)
insert into #Range ([Start], [End]) Values (14, 20)
insert into #Range ([Start], [End]) Values (20, 25)
insert into #Range ([Start], [End]) Values (30, 42)
insert into #Range ([Start], [End]) Values (42, 49)
insert into #Range ([Start], [End]) Values (60, 67)



;with RangeTable as
(select
    t1.[Start],
    t1.[End],
    row_number() over (order by t1.[Start]) as [Index]
from
    #Range t1
where t1.Start not in (select 
                      [End] 
               from
                  #Range
                  Union
               select 
                  [End] + 1
               from
                  #Range
               )
)
select 
    t1.[Start],
    case 
   when t2.[Start] is null then
        (select max([End])
                     from #Range)
       else
        (select max([End])
                     from #Range
                     where t2.[Start] > [End])
end as [End]    
from 
    RangeTable t1
left join 
    RangeTable t2
on
    t1.[Index] = t2.[Index]-1 

drop table #Range;

Bogor answered 31/8, 2011 at 15:45 Comment(3)

Hi Aducci, Your solution works fine also for other data bigger than the data in the example tables. – Vimineous 1/9, 2011 at 9:1

@Liran Ben Yehuda - Was there a reason you unmarked it as an answer? – Bogor 1/9, 2011 at 13:51

Tx for your support. I just look for the best solution and I have to make some performance tests before. – Vimineous 4/9, 2011 at 6:38

Edited to include another version which i think is a bit more reliable, and also works with overlapping ranges

CREATE TABLE #data (start_range INT, end_range INT)
INSERT INTO #data VALUES (1,8) 
INSERT INTO #data VALUES (2,15) 
INSERT INTO #data VALUES (9,13)
INSERT INTO #data VALUES (14,20) 
INSERT INTO #data VALUES (13,26) 
INSERT INTO #data VALUES (12,21) 
INSERT INTO #data VALUES (9,25) 
INSERT INTO #data VALUES (20,25) 
INSERT INTO #data VALUES (30,42) 
INSERT INTO #data VALUES (42,49) 
INSERT INTO #data VALUES (60,67)   

;with ranges as
(
SELECT start_range as level
,end_range as end_range
,row_number() OVER (PARTITION BY (SELECT NULL) ORDER BY start_range) as row
FROM #data
UNION ALL
SELECT
level + 1 as level
,end_range as end_range
,row
From ranges 
WHERE level < end_range
)
,ranges2 AS
(
SELECT DISTINCT 
level
FROM ranges
)
,ranges3 AS
(
SELECT 
level
,row_number() OVER (ORDER BY level) - level as grouping_group
from ranges2
)
SELECT 
MIN(level) as start_number
,MAX(level) as end_number
FROM ranges3
GROUP BY grouping_group
ORDER BY start_number ASC

I think this should work - might not be especially efficient on larger sets though...

CREATE TABLE #data (start_range INT, end_range INT)
INSERT INTO #data VALUES (1,8)
INSERT INTO #data VALUES (2,15)
INSERT INTO #data VALUES (9,13)
INSERT INTO #data VALUES (14,20)
INSERT INTO #data VALUES (21,25)
INSERT INTO #data VALUES (30,42)
INSERT INTO #data VALUES (42,49)
INSERT INTO #data VALUES (60,67)


;with overlaps as
(
select * 
,end_range - start_range as range
,row_number() OVER (PARTITION BY (SELECT NULL) ORDER BY start_range ASC) as line_number
from #data
)
,overlaps2 AS
(
SELECT
O1.start_range
,O1.end_range
,O1.line_number
,O1.range
,O2.start_range as next_range
,CASE WHEN O2.start_range - O1.end_range < 2 THEN 1 ELSE 0 END as overlap
,O1.line_number - DENSE_RANK() OVER (PARTITION BY (CASE WHEN O2.start_range - O1.end_range < 2 THEN 1 ELSE 0 END) ORDER BY O1.line_number ASC) as overlap_group
FROM overlaps O1
LEFT OUTER JOIN overlaps O2 on O2.line_number = O1.line_number + 1
)
SELECT 
MIN(start_range) as range_start
,MAX(end_range) as range_end
,MAX(end_range) - MIN(start_range) as range_span
FROM overlaps2
GROUP BY overlap_group

Canaille answered 31/8, 2011 at 14:47 Comment(9)

+1 Tested here and it worked. Good thing you included the CREATE and the INSERT statements. – Scyros 31/8, 2011 at 14:53

Hi Davin, Your second solution is much more reliable since the first one didn't worked well. Actually, the original tables don't contains any overlaps. If you have any idea how to resole the problem without overlaps in more efficient way I would like to know. Tx for your help :) – Vimineous 1/9, 2011 at 8:45

@Liran Ben Yehuda - in your original question you wanted examples of 1-10,11-20 AND 1-10,10-20 to give a range of 1-20 - so there is overlap as in the second case 10 appears twice, does this mean then that in your actual tables each start and end range value is unique? – Canaille 1/9, 2011 at 8:51

@Devin - Sorry for the confusion. You are right, I meant that overlapping range are not exist such as the second row (2,15) in your example (also the [9,25] row values) – Vimineous 1/9, 2011 at 9:0

@Liran, in that case I think that this should still be relatively efficient as the CTE will only fill in the gaps for ranges which are present and then sees which ones fall within 1 of the previous row - how many ranges worth of data do you have in your actual table? – Canaille 1/9, 2011 at 9:2

@Devin - Sorry, but I cannot estimate how many ranges worth of data will be in the real table... – Vimineous 1/9, 2011 at 9:5

@Devin - Is the Row_Number() function in ranges CTE really necessary in case of the table wont contains any null values ? – Vimineous 1/9, 2011 at 10:13

@Liran, You could drop the Row_number from ranges CTE if you definitely do not have any genuine overlapping rows. I included it so that you could tell from which range row the value came from (to validate overlaps) to make sure that all ranges were being included. – Canaille 1/9, 2011 at 10:30

let us continue this discussion in chat – Vimineous 1/9, 2011 at 11:10

This works for your example, let me know if it doesn't work for other data

create table #Range 
(
  [Start] INT,
  [End] INT
)

insert into #Range ([Start], [End]) Values (1, 8)
insert into #Range ([Start], [End]) Values (9, 13)
insert into #Range ([Start], [End]) Values (14, 20)
insert into #Range ([Start], [End]) Values (20, 25)
insert into #Range ([Start], [End]) Values (30, 42)
insert into #Range ([Start], [End]) Values (42, 49)
insert into #Range ([Start], [End]) Values (60, 67)



;with RangeTable as
(select
    t1.[Start],
    t1.[End],
    row_number() over (order by t1.[Start]) as [Index]
from
    #Range t1
where t1.Start not in (select 
                      [End] 
               from
                  #Range
                  Union
               select 
                  [End] + 1
               from
                  #Range
               )
)
select 
    t1.[Start],
    case 
   when t2.[Start] is null then
        (select max([End])
                     from #Range)
       else
        (select max([End])
                     from #Range
                     where t2.[Start] > [End])
end as [End]    
from 
    RangeTable t1
left join 
    RangeTable t2
on
    t1.[Index] = t2.[Index]-1 

drop table #Range;

Bogor answered 31/8, 2011 at 15:45 Comment(3)

Hi Aducci, Your solution works fine also for other data bigger than the data in the example tables. – Vimineous 1/9, 2011 at 9:1

@Liran Ben Yehuda - Was there a reason you unmarked it as an answer? – Bogor 1/9, 2011 at 13:51

Tx for your support. I just look for the best solution and I have to make some performance tests before. – Vimineous 4/9, 2011 at 6:38

You could use a number table to solve this problem. Basically, you first expand the ranges, then combine subsequent items in groups.

Here's one implementation:

WITH data (START, [END]) AS (
  SELECT  1,  8 UNION ALL
  SELECT  9, 13 UNION ALL
  SELECT 14, 20 UNION ALL
  SELECT 20, 25 UNION ALL
  SELECT 30, 42 UNION ALL
  SELECT 42, 49 UNION ALL
  SELECT 60, 67
),
expanded AS (
  SELECT DISTINCT
    N = d.START + v.number
  FROM data d
    INNER JOIN master..spt_values v ON v.number BETWEEN 0 AND d.[END] - d.START
  WHERE v.type = 'P'
),
marked AS (
  SELECT
    N,
    SeqID = N - ROW_NUMBER() OVER (ORDER BY N)
  FROM expanded
)
SELECT
  START = MIN(N),
  [END] = MAX(N)
FROM marked
GROUP BY SeqID

This solution uses master..spt_values as a number table, for expanding the initial ranges. But if (all or some of) those ranges may span more than 2048 (subsequent) values, then you should define and use your own number table.

Ptarmigan answered 1/9, 2011 at 17:9 Comment(0)

Recommended topics

Hot tags