How can we find gaps in sequential numbering in MySQL?
Asked Answered
K

16

147

We have a database with a table whose values were imported from another system. There is an auto-increment column, and there aren’t any duplicate values, but there are missing values. For example, running this query:

select count(id) from arrc_vouchers where id between 1 and 100

should return 100, but it returns 87 instead. Is there a query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there aren’t any records with id's of 71-82. I want to return 71, 72, 73, etc.

Is this possible?

Kinson answered 2/12, 2010 at 22:45 Comment(5)
This may not work in MySQL, but at work (Oracle) we needed something similar. We wrote a Stored Proc that took a number as the Max value. The Stored Proc then created a temp table with a single column. The table contained all the numbers from 1 to Max. Then it did a NOT IN join between the temp table and our table of interest. If you called it with Max = Select max(id) from arrc_vouchers, it would then return all the missing values.Murraymurre
What's wrong with having gaps in the numbering? The value of a surrogate key generally isn't meaningful; all that matters is that it's unique. If your application can't handle non-contiguous IDs, that's probably a bug in the application, not in the data.Postpaid
In this case it's an issue because the data we inherited from the old system used the auto-increment number associated with a record as a key to print on a physical card that's being handed out to people. This was NOT our idea. In order to find out which cards are missing, we need to know where the gaps are in the sequential numbering.Kinson
xaprb.com/blog/2005/12/06/… select l.id + 1 as start from sequence as l left outer join sequence as r on l.id + 1 = r.id where r.id is null;Vaulting
You can use generate series to generate numbers from 1 to the highest id of your table. Then run a query where id not in this series.Jimmyjimsonweed
B
193

A better answer

JustPlainMJS provided a much better answer in terms of performance.

The (not as fast as possible) answer

Here's a version that works on a table of any size (not just on 100 rows):

SELECT (t1.id + 1) as gap_starts_at,
       (SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
  • gap_starts_at - first id in current gap
  • gap_ends_at - last id in current gap
Berne answered 19/5, 2011 at 11:5 Comment(7)
the only problem with this, is that it doesn't "report" a possible initial gap. e.g. if the first 5 ids are missing (1 through 5) it doesn't show that... How could we show pissible gaps at the very begining?Preeminence
Note: This query doesn't work on temporary tables. My problem was the order number I was searching for gaps in is not distinct (the table stores order lines, so the order number they belong to repeats for each line). 1st query: 2812 rows in set (1 min 31.09 sec). Made another table by selecting distinct order numbers. Your query without my repeats: 1009 rows in set (18.04 sec)Rodriguez
@Preeminence What's wrong with SELECT MIN(id) FROM table?Wilhite
Worked but took about 5 hours to run on a table with 700000 recordsThorin
Thanks for letting us all know the (in)efficiency of the solution, even if it does work, @ThorinSchizothymia
Would using JOIN queries make it any faster? Something like SELECT t1.id + 1 AS gap_starts_at FROM arrc_vouchers t1 LEFT JOIN arrc_vouchers t2 ON t1.id + 1 = t2.id HAVING t2.id IS NULL? The gap_ends_at should work using the same subquery, too lazy to write it.Verbiage
This works great, I just have one question; how can I add a list of "blacklist" numbers that this script ignores? I've tried adding AND YourCol NOT IN (4,5,12) at several places but it doesn't seem to work.Plumbism
B
149

This just worked for me to find the gaps in a table with more than 80k rows:

SELECT
  CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
  SELECT
    @rownum:=@rownum+1 AS expected,
    IF(@rownum=YourCol, 0, @rownum:=YourCol) AS got
  FROM
    (SELECT @rownum:=0) AS a
    JOIN YourTable
    ORDER BY YourCol
  ) AS z
WHERE z.got!=0;

Result:

+------------------+
| missing          |
+------------------+
| 1 thru 99        |
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)

Note that the order of columns expected and got is critical.

If you know that YourCol doesn't start at 1 and that doesn't matter, you can replace

(SELECT @rownum:=0) AS a

with

(SELECT @rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a

New result:

+------------------+
| missing          |
+------------------+
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)

If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in Bash.

SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM ( SELECT @rownum:=@rownum+1 AS expected, IF(@rownum=height, 0, @rownum:=height) AS got FROM (SELECT @rownum:=0) AS a JOIN block ORDER BY height ) AS z WHERE z.got!=0;

This produces an output like so

$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)

You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID

for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
  echo $ID
  # Fill the gaps
done

It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.

Beshore answered 19/4, 2015 at 22:32 Comment(16)
Its much more efficient than the accepted answer.Yount
far faster than the accepted answer. The only thing that I'd add is that CONVERT( YourCol, UNSIGNED ) will give better results if YourCol isn't already an integer.Affianced
The only problem with this query is that it does not handle well the duplicated values. If we have for example 1, 2, 3, 3, 4, 5 it will return that 4 is missing.Rootstock
Worked real nice for me to go over 3 million. Pref answer would time out. This one was quick and cleanMauceri
For second query (with GROUP_CONCAT) might be helpful to increase limit of that function: SET SESSION group_concat_max_len = 1000000;Sauls
Nice! Is there a simple way to modify this query to include the final element's index?Camara
@AlexandreCassagne: If I'm understanding your question correctly, I would simply do a separate query like the embedded one for finding the min: SELECT MAX(YourCol) FROM YourTable;Beshore
How to tweak this query so that it would only return first N missing IDs? I.e. - I would like to retrieve first 10 missing primary keys, ordered by their value. No need to report on all gaps.Catercornered
@Catercornered Switch to GROUP_CONCAT variant if needed: SELECT IF((z.got-IF(z.over>0, z.over, 0)-1)>z.expected, CONCAT(z.expected,' thru ',(z.got-IF(z.over>0, z.over, 0)-1)), z.expected) AS missing FROM ( SELECT @rownum:=@rownum+1 AS expected, @target-@missing AS under, (@missing:=@missing+IF(@rownum=YourCol, 0, YourCol-@rownum))-@target AS over, IF(@rownum=YourCol, 0, @rownum:=YourCol) AS got FROM (SELECT @rownum:=0, @missing:=0, @target:=10) AS a JOIN YourTable ORDER BY YourCol ) AS z WHERE z.got!=0 AND z.under>0;Beshore
@Catercornered Well, that was ugly. Note that your N is assigned to @target.Beshore
I am confused that people are saying this is faster. I have something over 2.4 million IDs starting at 1, with a gap from 1388013 to 1388025. This code takes around 8 seconds while the accepted answer by @Berne returns instantly and reports as 0.000 seconds.Rosco
There seems to be some sort of caching going on. But when I change the formatting of the code and execute it again, matt's answer still only takes around 8 seconds.Rosco
This is extremely fast! it returned in 31ms for a query on a table with 10 million recordsEllipsis
Worked well for a million records. Took around 17s.Anciently
@Beshore If I filter down my table by 2 conditions and then order it, it ignores the order by: SELECT CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(',',z.got-1), '')) AS missing FROM ( SELECT @ rownum:=@ rownum+1 AS expected, IF(@ rownum=TRANSACTIONID, 0, @ rownum:=TRANSACTIONID) AS got FROM (SELECT @ rownum:=0) AS a JOIN transactions WHERE USER_ID='552' AND TYPE='2' GROUP BY TRANSACTIONID ORDER BY TRANSACTIONID ) AS z WHERE z.got!=0; If TRANSACTIONID 25 was inserted after 27 due to some issue, it will report it as missing.Ideograph
@PontusEndlessmindHolmberg I no longer have a live MySQL instance to test on, but I believe you need to put the conditions and grouping into a subquery but keep the ordering outside: ... JOIN (SELECT TRANSACTIONID FROM transactions WHERE USER_ID='552' AND TYPE='2' GROUP BY TRANSACTIONID) AS t ORDER BY TRANSACTIONID ...Beshore
S
13

A quick-and-dirty query that should do the trick:

SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM
 (
SELECT a1.id AS a , MIN(a2.id) AS b
FROM arrc_vouchers  AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab

WHERE
b > a + 1

This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between... E.g.,

id  next_id  missing_inbetween
 1        4                  2
68       70                  1
75       87                 11
Sheeran answered 2/12, 2010 at 23:7 Comment(2)
This worked great for me. Thanks.! I was able to easily modify this for my purposes.Hamburger
It seems this is the best answer when looking for 'next id' in gaps. Unfortunately it is EXTREMELY slow for tables with 10K's of rows. I've been waiting for more than 10 minutes on a ~46K table whereas with @ConfexianMJS I got results in less than a second!Malchus
N
12

If you are using a MariaDB database, you have a faster (800%) option using the sequence storage engine:

SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);
Novelty answered 10/1, 2017 at 14:33 Comment(3)
to expand on this idea, the max of the sequence can be established using "SELECT MAX(column) FROM table" and setting a variable from the result say $MAX... the sql statement can then be written "SELECT * FROM seq_1_to_". $MAX ." WHERE seq not in (SELECT column FROM table)" my syntax is php basedWaters
or you can use SELECT @var:= max FROM ....; select * from .. WHERE seq < @max; with MySQL variables.Novelty
Percentages do not make much sense above 100% (except for getting a higher number, artificially inflating the claim). Like 40,832,277,770% faster. So it is 9 times faster?Cooker
L
4

Create a temporary table with 100 rows and a single column containing the values 1-100.

Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.

This should work:

select tempid from temptable 
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id 
where arrc_vouchers.id is null
Loudspeaker answered 2/12, 2010 at 22:54 Comment(3)
OK, 1 - 100 was just an easy way to give an example. In this case, we're looking at 20,000 - 85,000. So do I create a temp table with 65,000 rows numbered 20000 - 85000? And how do I go about doing that? I'm using phpMyAdmin; if I set the default value of the column to 25000 and make it auto increment, can I just insert 65,000 rows and it will start the auto-increment with 25000?Kinson
I had a similar situation (I have 100 items in order and need to find missing items in 100). To do this, I created another table 1-100, then execute this statement on it and it works beautifully. This replaces a very complex function to create temporary tables. Just advice for someone in similar situation, it's sometimes faster to create a table than temp tables.Sherer
This is a very elegant and clear solution. Performance is most likely unimportant for these kinds of queries which are probably run only once in a blue moon.Phineas
S
3

An alternative solution that requires a query + some code doing some processing would be:

select l.id lValue, c.id cValue, r.id rValue
  from
  arrc_vouchers l
  right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
  left  join arrc_vouchers r on r.id=c.id+1
where 1=1
  and c.id > 0
  and (l.id is null or r.id is null)
order by c.id asc;

Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.

That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), i.e.:

lValue |cValue|rValue
-------+------+-------
{null} | 2    | 3
8      | 9    | {null}
{null} | 22   | 23
23     | 24   | {null}
{null} | 29   | {null}
{null} | 33   | {null}

Without going into further details (we'll see them in next paragraphs) this output means that:

  • No values between 0 and 2
  • No values between 9 and 22
  • No values between 24 and 29
  • No values between 29 and 33
  • No values between 33 and MAX VALUE

So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (i.e., if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.

The complete raw output of my table is:

select * from arrc_vouchers order by id asc;

0
2
3
4
5
6
7
8
9
22
23
24
29
33

Some notes:

  1. The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
  2. I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.
Swayback answered 8/3, 2014 at 4:3 Comment(0)
S
3

If there is a sequence having gap of maximum one between two numbers (like 1,3,5,6) then the query that can be used is:

select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
  • table_name - source1
  • column_name - id
Skiles answered 24/8, 2018 at 11:56 Comment(0)
S
3

I tried it in a different manner, and the best performance that I found was this simple query:

select a.id+1 gapIni
    ,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
    from arrc_vouchers a
    left join arrc_vouchers b on b.id=a.id+1
    where b.id is null
    order by 1
;

... one left join to check if the next id exists, only if next if is not found, then the subquery finds the next id that exists to find the end of gap. I did it because the query with equal (=) is better performance than the greater than (>) operator.

Using the sqlfiddle it does not show so a different performance compared to the other queries, but in a real database this query above results in 3 times faster than the others.

The schema:

CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;

Follow below all the queries that I made to compare the performance:

select a.id+1 gapIni
    ,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
    from arrc_vouchers a
    left join arrc_vouchers b on b.id=a.id+1
    where b.id is null
    order by 1
;
select *, (gapEnd-gapIni) qt
    from (
        select id+1 gapIni
        ,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
        from arrc_vouchers a
        order by id
    ) a where gapEnd <> gapIni
;
select id+1 gapIni
    ,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
    #,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
    from arrc_vouchers a
    where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
    order by id
;
select id+1 gapIni
    ,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
    from arrc_vouchers a
    order by id
;
select id+1 gapIni
    ,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
    from arrc_vouchers a
    order by id
;

You can see and test my query using this SQL Fiddle:

http://sqlfiddle.com/#!9/6bdca7/1

Suppletion answered 9/2, 2019 at 19:24 Comment(0)
T
2

It is probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e., next available number), and this seems to work fine.

SELECT MIN(l.number_sequence + 1) as nextavabile
from patients as l
LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence
WHERE r.number_sequence is NULL

Several other scenarios and solutions discussed there, from 2005!

How to Find Missing Values in a Sequence With SQL

Tillman answered 12/4, 2019 at 21:1 Comment(0)
I
2

A simple, yet effective, solution to find the missing auto-increment values:

SELECT `id`+1
FROM `table_name`
WHERE `id`+1 NOT IN (SELECT id FROM table_name)
Interlineate answered 7/4, 2021 at 8:4 Comment(1)
A simple yet wrong solution. If gap consists of multiple missing values (1,2,5), this will only find the firstOxen
C
1

Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.

I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.

SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;
Crissum answered 4/4, 2014 at 15:9 Comment(1)
this finds the first gap which is not what the question was asking for.Chimborazo
F
1

Based on the answer given by matt, this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use @var to represent tables &/or columns in a stored procedure.

create definer=`root`@`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);

set @tbl=cast(param_tbl as char character set utf8);
set @col=cast(param_col as char character set utf8);

set @strsql=concat("select
    ( t1.",@col," + 1 ) as starts_at,
  ( select min(t3.",@col,") -1 from ",@tbl," t3 where t3.",@col," > t1.",@col," ) as ends_at
    from ",@tbl," t1
        where not exists ( select t2.",@col," from ",@tbl," t2 where t2.",@col," = t1.",@col," + 1 )
        having ends_at is not null");

prepare stmt from @strsql;
execute stmt;
deallocate prepare stmt;
end
Frost answered 17/2, 2015 at 12:31 Comment(0)
I
0

Another simple answer that identifies the gaps. We do a query selecting just the odd numbers and we right join it to a query with all the even numbers. As long as you're not missing id 1; this should give you a comprehensive list of where the gaps start.

You'll still have to take a look at that place in the database to figure out how many numbers the gap is. I found this way easier than the solution proposed and much easier to customize to unique situations.

SELECT *
FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 > 0) AS A
RIGHT JOIN FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 = 0) AS B
ON A.MYFIELD=(B.MYFIELD+1)
WHERE a.id IS NULL;
Inborn answered 11/9, 2021 at 21:23 Comment(3)
This gave me an error of "Query 1 ERROR: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM (SELECT * FROM" ... think it should be right join not right join from?Anissaanita
@LeonSegal I ran that literal query on my MySQL instance. Maybe its a MySQL Version issue? I'm running 8.0.9.Inborn
l am using mysql 8.0.21Anissaanita
O
0

Starting from the comment posted by user933161,

select l.id + 1 as start from sequence as l inner join sequence as r on l.id + 1 = r.id where r.id is null;

is better in that it will not produce a false positive for the end of the list of records. (I'm not sure why so many are using left outer joins.) Also,

insert into sequence (id) values (#);

where # is the start value for a gap will fill that start value. (If there are fields that cannot be null, you will have to add those with dummy values.)

You could alternate between querying for start values and filling in each start value until the query for start values returns an empty set.

Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.

Ontario answered 23/2, 2022 at 15:3 Comment(0)
G
0

This works for me:

SELECT distinct(l.membership_no + 1) as nextavabile
from Tablename as l
LEFT OUTER JOIN Tablename as r on l.membership_no + 1 = r.membership_no
WHERE r.membership_no is NULL and l.membership_no is not null order by nextavabile asc;
Gantline answered 8/3, 2022 at 6:20 Comment(0)
D
0
CREATE TABLE arrc_vouchers (id int primary key);
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16);

WITH RECURSIVE odd_num_cte (id) AS  
(  
    SELECT (select min(id) from arrc_vouchers)   
    union all  
    SELECT id+1 from odd_num_cte where id <(SELECT max(id) from arrc_vouchers)   
)  
SELECT cte.id 
from arrc_vouchers ar right outer join odd_num_cte cte on ar.id=cte.id
where ar.id is null;
Delusion answered 15/12, 2022 at 20:52 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Newsletter

© 2022 - 2024 — McMap. All rights reserved.