SQL - Find missing int values in mostly ordered sequential series
Asked Answered
E

6

2

I manage a message based system in which a sequence of unique integer ids will be entirely represented at the end of the day, though they will not necessarily arrive in order.

I am looking for help in finding missing ids in this series using SQL. If my column values are something like the below, how can I find which ids I am missing in this sequence, in this case 6?

The sequence will begin and end at an arbitrary point each day, so min and max would differ upon each run. Coming from a Perl background I through some regex in there.

ids
1
2
3
5
4
7
9
8
10

Help would be much appreciated.

Edit: We run oracle

Edit2: Thanks all. I'll be running through your solutions next week in the office.

Edit3: I settled for the time being on something like the below, with ORIG_ID being the original id column and MY_TABLE being the source table. In looking closer at my data, there are a variety of cases beyond just number data in a string. In some cases there is a prefix or suffix of non-numeric characters. In others, there are dashes or spaces intermixed into the numeric id. Beyond this, ids periodically appear multiple times, so I included distinct.

I would appreciate any further input, specifically in regard to the best route of stripping out non-numeric characters.

SELECT 
   CASE
      WHEN NUMERIC_ID + 1 = NEXT_ID - 1
         THEN TO_CHAR( NUMERIC_ID + 1 )
      ELSE TO_CHAR( NUMERIC_ID + 1 ) || '-' || TO_CHAR( NEXT_ID - 1 )
   END
   MISSING_SEQUENCES
   FROM
   (
      SELECT
         NUMERIC_ID,
         LEAD (NUMERIC_ID, 1, NULL)
            OVER 
            (
               ORDER BY
                 NUMERIC_ID
                 ASC
            )
            AS NEXT_ID
         FROM 
         (
             SELECT
                DISTINCT TO_NUMBER( REGEXP_REPLACE(ORIG_ID,'[^[:digit:]]','') ) 
                AS NUMERIC_ID
             FROM MY_TABLE
         )
    ) WHERE NEXT_ID != NUMERIC_ID + 1
Elgar answered 3/12, 2011 at 0:5 Comment(4)
I read the title as "SELECT * FROM FOO MOSTLY ORDER BY..."Loads
You should add a tag for the brand of RDBMS you're using, e.g. sql-server, oracle, postgresql, mysql, etc.Salyers
How would you know that id=11 and id=0 aren't missing?Teacake
I wouldn't necessarily, but this is more of a sanity check to trigger further investigation.Elgar
B
6

I've been there.

FOR ORACLE:

I found this extremely useful query on the net a while ago and noted down, however I don't remember the site now, you may search for "GAP ANALYSIS" on Google.

SELECT   CASE
             WHEN ids + 1 = lead_no - 1 THEN TO_CHAR (ids +1)
          ELSE TO_CHAR (ids + 1) || '-' || TO_CHAR (lead_no - 1)
         END
             Missing_track_no
   FROM   (SELECT   ids,
                    LEAD (ids, 1, NULL)
                     OVER (ORDER BY ids ASC)
                        lead_no
             FROM   YOURTABLE
             )
   WHERE   lead_no != ids + 1

Here, the result is:

MISSING _TRACK_NO
-----------------
       6

If there were multiple gaps,say 2,6,7,9 then it would be:

MISSING _TRACK_NO
-----------------
        2
       6-7
        9
Blackfish answered 3/12, 2011 at 15:40 Comment(5)
Interesting. I'll have to give it a when I'm back in the office. Thanks.Elgar
One small extra wrinkle. In looking closer, despite the data being numberic, the ids column is varchar2. I assume I can use TO_NUMBER for references to ids in the above?Elgar
Try it and share your result. The '-' wont be stored in numeric column.Blackfish
TO_NUMBER fit the majority of my use cases. I had a couple situations where the numeric id was prefixed or suffixed by a variety of letter characters. I ran a regex on the id before using TO_NUMBER to convert it. This works, though I'm not sure if a REGEX_REPLACE is the best route for such a task.Elgar
I don't know your overall system, so I am not able to give you a %100 solution. However, you may try ELSE TO_CHAR (ids + 1) || TO_CHAR (lead_no - 1). Instead of adding a '-'. So you won't need to use regex. It will be 67 instead of 6-7. I think it is not hard to understand the missing ones are 6 & 7. But again, it is your choice.Blackfish
S
4

This is sometimes called an exclusion join. That is, try to do a join and return only rows where there is no match.

SELECT t1.value-1
FROM ThisTable AS t1
LEFT OUTER JOIN ThisTable AS t2
  ON t1.id = t2.value+1
WHERE t2.value IS NULL

Note this will always report at least one row, which will be the MIN value.

Also, if there are gaps of two or more numbers, it will only report one missing value.

Salyers answered 3/12, 2011 at 0:14 Comment(1)
Thanks, I suppose I could do the same for with value + 1 to find the other bound of my missing ranges. Is there any way to do this in a single query and maybe concat chars to the end of each type of bound to distinguish them?Elgar
P
3

You didn't state your DBMS, so I'm assuming PostgreSQL:

select aid as missing_id
from generate_series( (select min(id) from message), (select max(id) from message)) as aid
  left join message m on m.id = aid
where m.id is null;  

This will report any missing value in a sequence between the minimum and maximum id in your table (including gaps that are bigger than one)

psql (9.1.1)
Type "help" for help.

postgres=> select * from message;
 id
----
  1
  2
  3
  4
  5
  7
  8
  9
 11
 14
(10 rows)


postgres=> select aid as missing_id
postgres-> from generate_series( (select min(id) from message), (select max(id) from message)) as aid
postgres->   left join message m on m.id = aid
postgres-> where m.id is null;
 missing_id
------------
          6
         10
         12
         13
(4 rows)
postgres=>
Punctilio answered 3/12, 2011 at 0:51 Comment(1)
Thaaaanks! Works for me!Innerdirected
B
0

I applied it in mysql, it worked ..

mysql> select * from sequence;
+--------+
| number |
+--------+
|      1 |
|      2 |
|      4 |
|      6 |
|      7 |
|      8 |
+--------+
6 rows in set (0.00 sec)

mysql> SELECT t1.number - 1 FROM sequence AS t1 LEFT OUTER JOIN sequence AS t2 O
N t1.number = t2.number +1 WHERE t2.number IS NULL;
+---------------+
| t1.number - 1 |
+---------------+
|             0 |
|             3 |
|             5 |
+---------------+
3 rows in set (0.00 sec)
Ballflower answered 3/12, 2011 at 0:24 Comment(0)
C
0
SET search_path='tmp';

DROP table tmp.table_name CASCADE;
CREATE table tmp.table_name ( num INTEGER NOT NULL PRIMARY KEY);
-- make some data
INSERT INTO tmp.table_name(num) SELECT generate_series(1,20);
-- create some gaps
DELETE FROM tmp.table_name WHERE random() < 0.3 ;

SELECT * FROM table_name;

-- EXPLAIN ANALYZE
WITH zbot AS (
    SELECT 1+tn.num  AS num
    FROM table_name tn
    WHERE NOT EXISTS (
        SELECT * FROM table_name nx
        WHERE nx.num = tn.num+1
        )
    )
, ztop AS (
    SELECT -1+tn.num  AS num
    FROM table_name tn
    WHERE NOT EXISTS (
        SELECT * FROM table_name nx
        WHERE nx.num = tn.num-1
        )
    )
SELECT zbot.num AS bot
    ,ztop.num AS top
FROM zbot, ztop
WHERE zbot.num <= ztop.num
AND NOT EXISTS ( SELECT *
    FROM table_name nx
    WHERE nx.num >= zbot.num
    AND nx.num <= ztop.num
    )
ORDER BY bot,top
    ;

Result:

CREATE TABLE
INSERT 0 20
DELETE 9
 num 
-----
   1
   2
   6
   7
  10
  11
  13
  14
  15
  18
  19
(11 rows)

 bot | top 
-----+-----
   3 |   5
   8 |   9
  12 |  12
  16 |  17
(4 rows)

Note: a recursive CTE is also possible (and probably shorter).

UPDATE: here comes the recursive CTE ...:

WITH RECURSIVE tree AS (
    SELECT 1+num AS num
    FROM table_name t0
    UNION
    SELECT 1+num FROM tree tt
    WHERE EXISTS ( SELECT *
        FROM table_name xt
        WHERE xt.num > tt.num
        )
    )
SELECT * FROM tree
WHERE NOT EXISTS (
    SELECT *
    FROM table_name nx
    WHERE nx.num = tree.num
    )
ORDER BY num
    ;

Results: (same data)

 num 
-----
   3
   4
   5
   8
   9
  12
  16
  17
  20
 (9 rows)
Cabbageworm answered 3/12, 2011 at 15:40 Comment(0)
B
0
select student_key, next_student_key
      from (
    select student_key, lead(student_key) over (order by student_key) next_fed_cls_prgrm_key
      from student_table
           )
where student_key <> next_student_key-1;
Bosco answered 31/7, 2013 at 15:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.