How can you find ID gaps in a MySQL recordset?
Asked Answered
T

4

5

The issue here is related to another question I had...

I have millions of records, and the ID of each of those records is auto-incremented, unfortunately sometimes the ID that is generated is sometimes thrown away so there are many many gaps between IDs.

I want to find the gaps, and re-use the ids that were abandoned.

What's an efficient way to do this in MySQL?

Typhogenic answered 9/12, 2011 at 17:58 Comment(6)
Related: #3718729Rorie
If you're using an INT for your primary key, you can have like 2 billion+ records. Why bother trying to fill the gaps? Are you running out of numbers? I find that there is an advantage to knowing that the numbers correspond to the order that the records were added.Marleah
Maybe you'll run into less performance troubles by changing your primary key type to BIGINT (if 4 billon values provided by INT is too short) than trying to reuse IDs on a very big table.Rorie
+1 for good feedback here. I've not considered that maybe its just better to not worry about the gaps.Typhogenic
Some others before you had the idea of reusing abandoned ids (in some cases citizen identity numbers belonging to dead people), and that "savvy" decision lead to infinity of problems for those people inheriting the reused id's. I would no recommend in any way doing such a thing.Seville
@Seville reading this comment again gave me a good laugh, it really illustrates the problem well.Typhogenic
M
17

First of all, what advantage are you trying to get by reusing the skipped values? An ordinary INT UNSIGNED will let you count up to 4,294,967,295. With "millions of records" your database would have to grow a thousand times over before running out of valid IDs. (And then using a BIGINT UNSIGNED will bump you up to 18,446,744,073,709,551,615 values.)

Trying to recycle values MySQL has skipped is likely to use up a lot of your time trying to compensate for something that really doesn't bother MySQL in the first place.

With that said, you can find missing IDs with something like:

SELECT id + 1
FROM the_table
WHERE NOT EXISTS (SELECT 1 FROM the_table t2 WHERE t2.id = the_table.id + 1);

This will find only the first missing number in each sequence (e.g., if you have {1, 2, 3, 8, 10} it will find {4,9}) but it's likely to be efficient, and of course once you've filled in an ID you can always run it again.

Malachy answered 9/12, 2011 at 18:7 Comment(2)
if 1 is the first gap it will not be returnedPredominant
In my case each missing number is important, so is the last paragraph of answer :) +1 UpvoteMelatonin
W
2

The following will return a row for each gap in the integer field "n" in mytab:

/* cs will contain 1 row for each contiguous sequence of integers in mytab.n
   and will have the start of that chain.
   ce will contain the end of that chain */
create temporary table cs (row int auto_increment primary key, n int);
create temporary table ce like cs;
insert into cs (n) select n from mytab where n-1 not in (select n from mytab) order by n;
insert into ce (n) select n from mytab where n+1 not in (select n from mytab) order by n;
select ce.n + 1 as bgap, cs.n - 1 as egap
  from cs, ce where cs.row = ce.row + 1;

If instead of the gaps you want the contiguous chains then the final select should be:

select cs.n as bchain, ce.n as echain from cs,ce where cs.row=ce.row;
Wastage answered 30/12, 2012 at 17:52 Comment(1)
the second query ''select cs.n as bchain, ce.n as echain from cs,ce where cs.row=ce.row;'' the join on display larger gap that actually exists but the first one works just fine.Ceiba
L
1

This solution is better, in case you need to include the first element as 1:

SELECT
    1 AS gap_start,
    MIN(e.id) - 1 AS gap_end
FROM
    factura_entrada e
WHERE
    NOT EXISTS(
        SELECT
            1
        FROM
            factura_entrada
        WHERE
            id = 1
    )
LIMIT 1
UNION
    SELECT
        a.id + 1 AS gap_start,
        MIN(b.id)- 1 AS gap_end
    FROM
        factura_entrada AS a,
        factura_entrada AS b
    WHERE
        a.id < b.id
    GROUP BY
        a.id
    HAVING
        gap_start < MIN(b.id);
Lacerate answered 17/6, 2013 at 13:26 Comment(0)
C
1

If you are using an MariaDB you have a faster option

SELECT * FROM seq_1_to_50000 where seq not in (select col from table);

docs: https://mariadb.com/kb/en/mariadb/sequence/

Caterwaul answered 10/1, 2017 at 14:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.