How to check a sequence efficiently for used and unused values in PostgreSQL
Asked Answered
F

4

2

In PostgreSQL (9.3) I have a table defined as:

CREATE TABLE charts
( recid serial NOT NULL,
  groupid text NOT NULL,
  chart_number integer NOT NULL,
  "timestamp" timestamp without time zone NOT NULL DEFAULT now(),
  modified timestamp without time zone NOT NULL DEFAULT now(),
  donotsee boolean,
  CONSTRAINT pk_charts PRIMARY KEY (recid),
  CONSTRAINT chart_groupid UNIQUE (groupid),
  CONSTRAINT charts_ichart_key UNIQUE (chart_number)
);

CREATE TRIGGER update_modified
  BEFORE UPDATE ON charts
  FOR EACH ROW EXECUTE PROCEDURE update_modified();

I would like to replace the chart_number with a sequence like:

CREATE SEQUENCE charts_chartnumber_seq START 16047;

So that by trigger or function, adding a new chart record automatically generates a new chart number in ascending order. However, no existing chart record can have its chart number changed and over the years there have been skips in the assigned chart numbers. Hence, before assigning a new chart number to a new chart record, I need to be sure that the "new" chart number has not yet been used and any chart record with a chart number is not assigned a different number.

How can this be done?

Frivolous answered 20/9, 2015 at 21:14 Comment(4)
Gaps in a sequence are nothing to worry about. And the sequence will never generate the same number twice (unless you manually mess with it)Smithy
@a_horse_with_no_name Yes...but over time, I would like to fill in the gaps in the sequence so that the chart numbers in use are continuous. :) Thanks.Frivolous
recid is a serial, but chart_number is not? The fact that a sequence charts_chartnumber_seq exists means nothing by itself. assigning a new chart number to a new chart record .. are we talking about INSERT?Radiothermy
@Erwin Brandstetter Yes...I would like to insert a new chart record into the table and have it automatically assigned an unused chart number. The chart numbers should come from the gaps in the sequence until all the "gapped" number have been used. After which point, the simple nextval will be correct.Frivolous
R
5

Consider not doing it. Read these related answers first:

If you still insist on filling in gaps, here is a rather efficient solution:

1. To avoid searching large parts of the table for the next missing chart_number, create a helper table with all current gaps once:

CREATE TABLE chart_gap AS
SELECT chart_number
FROM   generate_series(1, (SELECT max(chart_number) - 1  -- max is no gap
                           FROM charts)) chart_number
LEFT   JOIN charts c USING (chart_number)
WHERE  c.chart_number IS NULL;

2. Set charts_chartnumber_seq to the current maximum and convert chart_number to an actual serial column:

SELECT setval('charts_chartnumber_seq', max(chart_number)) FROM charts;

ALTER TABLE charts
   ALTER COLUMN chart_number SET NOT NULL
 , ALTER COLUMN chart_number SET DEFAULT nextval('charts_chartnumber_seq');

ALTER SEQUENCE charts_chartnumber_seq OWNED BY charts.chart_number; 

Details:

3. While chart_gap is not empty fetch the next chart_number from there. To resolve possible race conditions with concurrent transactions, without making transactions wait, use advisory locks:

WITH sel AS (
   SELECT chart_number, ...  -- other input values
   FROM   chart_gap
   WHERE  pg_try_advisory_xact_lock(chart_number)
   LIMIT  1
   FOR    UPDATE
   )
, ins AS (
   INSERT INTO charts (chart_number, ...) -- other target columns
   TABLE sel 
   RETURNING chart_number
   )
DELETE FROM chart_gap c
USING  ins i
WHERE  i.chart_number = c.chart_number;

Alternatively, Postgres 9.5 or later has the handy FOR UPDATE SKIP LOCKED to make this simpler and faster:

...
   SELECT chart_number, ...  -- other input values
   FROM   chart_gap
   LIMIT  1
   FOR    UPDATE SKIP LOCKED
...

Detailed explanation:

Check the result. Once all rows are filled in, this returns 0 rows affected. (you could check in plpgsql with IF NOT FOUND THEN ...). Then switch to a simple INSERT:

   INSERT INTO charts (...)  -- don't list chart_number
   VALUES (...);  --  don't provide chart_number
Radiothermy answered 21/9, 2015 at 1:26 Comment(0)
K
4

In PostgreSQL, a SEQUENCE ensures the two requirements you mention, that is:

  1. No repeats
  2. No changes once assigned

But because of how a SEQUENCE works (see manual), it can not ensure no-skips. Among others, the first two reasons that come to mind are:

  1. How a SEQUENCE handles concurrent blocks with INSERTS (you could also add that the concept of Cache also makes this impossible)
  2. Also, user triggered DELETEs are an uncontrollable aspect that a SEQUENCE can not handle by itself.

In both cases, if you still do not want skips, (and if you really know what you're doing) you should have a separate structure that assign IDs (instead of using SEQUENCE). Basically a system that has a list of 'assignable' IDs stored in a TABLE that has a function to pop out IDs in a FIFO way. That should allow you to control DELETEs etc.

But again, this should be attempted, only if you really know what you're doing! There's a reason why people don't do SEQUENCEs themselves. There are hard corner-cases (for e.g. concurrent INSERTs) and most probably you're over-engineering your problem case, that probably can be solved in a much better / cleaner way.

Kisser answered 21/9, 2015 at 0:51 Comment(0)
O
3

Sequence numbers usually have no meaning, so why worry? But if you really want this, then follow the below, cumbersome procedure. Note that it is not efficient; the only efficient option is to forget about the holes and use the sequence.

In order to avoid having to scan the charts table on every insert, you should scan the table once and store the unused chart_number values in a separate table:

CREATE TABLE charts_unused_chart_number AS
  SELECT seq.unused
  FROM (SELECT max(chart_number) FROM charts) mx,
       generate_series(1, mx(max)) seq(unused)
  LEFT JOIN charts ON charts.chart_number = seq.unused
  WHERE charts.recid IS NULL;

The above query generates a contiguous series of numbers from 1 to the current maximum chart_number value, then LEFT JOINs the charts table to it and find the records where there is no corresponding charts data, meaning that value of the series is unused as a chart_number.

Next you create a trigger that fires on an INSERT on the charts table. In the trigger function, pick a value from the table created in the step above:

CREATE FUNCTION pick_unused_chart_number() RETURNS trigger AS $$
BEGIN
  -- Get an unused chart number
  SELECT unused INTO NEW.chart_number FROM charts_unused_chart_number LIMIT 1;

  -- If the table is empty, get one from the sequence
  IF NOT FOUND THEN
    NEW.chart_number := next_val(charts_chartnumber_seq);
  END IF;

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tr_charts_cn
BEFORE INSERT ON charts
FOR EACH ROW EXECUTE PROCEDURE pick_unused_chart_number();

Easy. But the INSERT may fail because of some other trigger aborting the procedure or any other reason. So you need a check to ascertain that the chart_number was indeed inserted:

CREATE FUNCTION verify_chart_number() RETURNS trigger AS $$
BEGIN
  -- If you get here, the INSERT was successful, so delete the chart_number
  -- from the temporary table.
  DELETE FROM charts_unused_chart_number WHERE unused = NEW.chart_number;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tr_charts_verify
AFTER INSERT ON charts
FOR EACH ROW EXECUTE PROCEDURE verify_chart_number();

At a certain point the table with unused chart numbers will be empty whereupon you can (1) ALTER TABLE charts to use the sequence instead of an integer for chart_number; (2) delete the two triggers; and (3) the table with unused chart numbers; all in a single transaction.

Outrun answered 21/9, 2015 at 1:2 Comment(3)
You'll also want a trigger that inserts values into the table of free numbers when there's a delete. You do not need a separate trigger for after insert, it's fine to do it in the before trigger since it won't be visible to other tx's until committed. In fact you need to in order to ensure that multi-row inserts don't get duplicate IDs.Curtate
@Outrun This is also the answer I need, but I don't know how to check more the one answer. Thank you.Frivolous
No, you can accept only one answer. You can un-accept the answer Erwin gave and then accept this answer, but that seems a bit over the top.Outrun
C
1

While what you want is possible, it can't be done using only a SEQUENCE and it requires an exclusive lock on the table, or a retry loop, to work.

You'll need to:

  • LOCK thetable IN EXCLUSIVE MODE
  • Find the first free ID by querying for the max id then doing a left join over generate_series to find the first free entry. If there is one.
  • If there is a free entry, insert it.
  • If there is no free entry, call nextval and return the result.

Performance will be absolutely horrible, and transactions will be serialized. There'll be no concurrency. Also, unless the LOCK is the first thing you run that affects that table, you'll face deadlocks that cause transaction aborts.

You can make this less bad by using an AFTER DELETE .. FOR EACH ROW trigger that keeps track of entries you delete by INSERTing them into a one-column table that keeps track of spare IDs. You can then SELECT the lowest ID from the table in your ID assignment function on the default for the column, avoiding the need for the explicit table lock, the left join on generate_series and the max call. Transactions will still be serialized on a lock on the free IDs table. In PostgreSQL you can even solve that using SELECT ... FOR UPDATE SKIP LOCKED. So if you're on 9.5 you can actually make this non-awful, though it'll still be slow.

I strongly advise you to just use a SEQUENCE directly, and not bother with re-using values.

Curtate answered 21/9, 2015 at 1:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.