I manage a message based system in which a sequence of unique integer ids will be entirely represented at the end of the day, though they will not necessarily arrive in order.
I am looking for help in finding missing ids in this series using SQL. If my column values are something like the below, how can I find which ids I am missing in this sequence, in this case 6
?
The sequence will begin and end at an arbitrary point each day, so min and max would differ upon each run. Coming from a Perl background I through some regex in there.
ids
1
2
3
5
4
7
9
8
10
Help would be much appreciated.
Edit: We run oracle
Edit2: Thanks all. I'll be running through your solutions next week in the office.
Edit3: I settled for the time being on something like the below, with ORIG_ID being the original id column and MY_TABLE being the source table. In looking closer at my data, there are a variety of cases beyond just number data in a string. In some cases there is a prefix or suffix of non-numeric characters. In others, there are dashes or spaces intermixed into the numeric id. Beyond this, ids periodically appear multiple times, so I included distinct.
I would appreciate any further input, specifically in regard to the best route of stripping out non-numeric characters.
SELECT
CASE
WHEN NUMERIC_ID + 1 = NEXT_ID - 1
THEN TO_CHAR( NUMERIC_ID + 1 )
ELSE TO_CHAR( NUMERIC_ID + 1 ) || '-' || TO_CHAR( NEXT_ID - 1 )
END
MISSING_SEQUENCES
FROM
(
SELECT
NUMERIC_ID,
LEAD (NUMERIC_ID, 1, NULL)
OVER
(
ORDER BY
NUMERIC_ID
ASC
)
AS NEXT_ID
FROM
(
SELECT
DISTINCT TO_NUMBER( REGEXP_REPLACE(ORIG_ID,'[^[:digit:]]','') )
AS NUMERIC_ID
FROM MY_TABLE
)
) WHERE NEXT_ID != NUMERIC_ID + 1