How to group ranged values using SQL Server

Asked 23/1, 2009 at 12:38 Answered 23/1, 2009 at 14:58

Solved sql-server t-sql common-table-expression gaps-and-islands

I have a table of values like this

978412, 400
978813, 20
978834, 50
981001, 20

As you can see the second number when added to the first is 1 number before the next in the sequence. The last number is not in the range (doesnt follow a direct sequence, as in the next value). What I need is a CTE (yes, ideally) that will output this

978412, 472
981001, 20

The first row contains the start number of the range then the sum of the nodes within. The next row is the next range which in this example is the same as the original data.

Ruff answered 23/1, 2009 at 12:38 Comment(3)

If you've ordered by the first column then the last number is in sequence as it's greater than any other number (981001 > 978834). How did 400 become 472? – Snowplow 23/1, 2009 at 12:53

978412 is the first number in the range, the 472 is the number of nodes within that range (difference between last number and first - last number being 978834+50). – Ruff 23/1, 2009 at 12:58

I believe there are articles on this type of query , its "called Islands and Gaps" .. but I am struggling to work out how it is done. – Ruff 23/1, 2009 at 12:58

From the article that Josh posted, here's my take (tested and working):

SELECT
    MAX(t1.gapID) as gapID,
    t2.gapID-MAX(t1.gapID)+t2.gapSize as gapSize
    -- max(t1) is the specific lower bound of t2 because of the group by.
FROM
  ( -- t1 is the lower boundary of an island.
    SELECT gapID
    FROM gaps tbl1 
    WHERE
      NOT EXISTS(
        SELECT *
        FROM gaps tbl2 
        WHERE tbl1.gapID = tbl2.gapID + tbl2.gapSize + 1
      )
  ) t1
  INNER JOIN ( -- t2 is the upper boundary of an island.
    SELECT gapID, gapSize
    FROM gaps tbl1 
    WHERE
      NOT EXISTS(
        SELECT * FROM gaps tbl2 
        WHERE tbl2.gapID = tbl1.gapID + tbl1.gapSize + 1
      )
  ) t2 ON t1.gapID <= t2.gapID -- For all t1, we get all bigger t2 and opposite.
GROUP BY t2.gapID, t2.gapSize

Zaxis answered 23/1, 2009 at 14:58 Comment(1)

By the article benchmarks, it should be the fastest set based solution. – Zaxis 23/1, 2009 at 15:28

Check out this MSDN Article. It gives you a solution to your problem, if it will work for you depends on the ammount of data you have and your performance requirements for the query.

Edit:

Well using the example in the query, and going with his last solution the second way to get islands (first way resulted in an error on SQL 2005).

SELECT MIN(start) AS  startGroup, endGroup, (endgroup-min(start) +1) as NumNodes
FROM (SELECT g1.gapID AS start,
(SELECT min(g2.gapID) FROM #gaps g2 
WHERE g2.gapID >= g1.gapID and NOT EXISTS
(SELECT * FROM #gaps g3 
WHERE g3.gapID - g2.gapID = 1)) as endGroup
FROM #gaps g1) T1 GROUP BY endGroup

The thing I added is (endgroup-min(start) +1) as NumNodes. This will give you the counts.

Aspect answered 23/1, 2009 at 13:50 Comment(0)

Edit:

Recommended topics

Hot tags