How to find the boundaries of groups of contiguous sequential numbers?
Asked Answered
H

1

13

I have a table with the following definition

CREATE TABLE mytable
  (
     id     INT IDENTITY(1, 1) PRIMARY KEY,
     number BIGINT,
     status INT
  )

and example data

INSERT INTO mytable
VALUES (100,0),
       (101,0),
       (102,0),
       (103,0),
       (104,1),
       (105,1),
       (106,0),
       (107,0),
       (1014,0),
       (1015,0),
       (1016,1),
       (1017,0)

Looking only at the rows where status = 0 how can I collapse the Number values into ranges of contiguous sequential numbers and find the start and end of each range?

i.e. For the example data the results would be

         FROM      to 
Number    100      103
Number    106      107
Number    1014     1015
Number    1017     1017
Hemipterous answered 11/6, 2013 at 14:8 Comment(7)
I've done this before, months ago. I don't think I saved the query, but it might still be in my local logs. Hold tight.Jibe
google for "sql server gaps and islands"Underline
sorry i can't find it.Jibe
Oh wow there's an entire tag on SO for this question: stackoverflow.com/questions/tagged/gaps-and-islandsJibe
SQL is very inefficient at looking at data across rows without a cursor (and cursors are bad!). I strongly recommend you coming up with a way to either store the ranges in another collection, or pull down all the data and crunch the numbers (i.e. calculate the ranges) yourself. With that said, you can likely get away with using a recursive CTE to get the job done. How exactly, I'm not sure. As suggested, check out gaps and islands.Solarize
come on genius, I know u can find the solution ;)Hemipterous
@EliGassert This is entirely untrue. "Gaps and Islands" problems can be solved very efficiently in SQL with resorting to either Cursors or the even more iniefficient Recursive CTEs. Itzak Ben-Gan has written extensively on how this can be done. Here: sqlmag.com/article/tsql3/calculating-concurrent-sessions-part-3Franchescafranchise
S
33

As mentioned in the comments this is a classic gaps and islands problem.

A solution popularized by Itzik Ben Gan is to use the fact that DENSE_RANK() OVER (ORDER BY number) - number remains constant within an "island" and cannot appear in multiple islands.

WITH T
     AS (SELECT DENSE_RANK() OVER (ORDER BY number) - number AS Grp,
                number
         FROM   mytable
         WHERE  status = 0)
SELECT MIN(number) AS [From],
       MAX(number) AS [To]
FROM   T
GROUP  BY Grp
ORDER  BY MIN(number);

On later versions my preference is to use LAG/LEAD for this to reduce the amount of sorting going on. The below should work from 2012+

WITH T AS
(
SELECT *,
       MinNumber  = MIN(number) OVER (ORDER BY number ROWS UNBOUNDED PRECEDING),
       NextNumber = LEAD(number) OVER (ORDER BY number)
FROM mytable
WHERE status = 0
)
SELECT LAG(NextNumber, 1, MinNumber) OVER (ORDER BY number) AS [From], 
       number AS [To]
FROM T
WHERE NextNumber IS NULL OR NextNumber <> number+ 1
Springwood answered 11/6, 2013 at 14:34 Comment(5)
+1: Yes, this is the best approach. No cursed Cursors or inefficent recursion necessary.Franchescafranchise
@Franchescafranchise - Very modest! Apologies for crediting the wrong person in my answer!Springwood
@MartinSmith Yeah, my one claim to fame. :-) Besides, Itzak Ben-Gan really is a genius IMHO.Franchescafranchise
why did you user the - in your query ?Hemipterous
@MikeStation - As opposed to what? The GRP value needs to be something constant within an island and unique to an island that achieves it. e.g. See the results here and experiment with changing the example data and you should see how it works. sqlfiddle.com/#!6/f1040/2Springwood

© 2022 - 2024 — McMap. All rights reserved.