PostgreSQL sequence based on another column
Asked Answered
C

8

69

Lets say I have a table as such:

Column   |     Type    |                        Notes
---------+------------ +----------------------------------------------------------
 id      | integer     | An ID that's FK to some other table
 seq     | integer     | Each ID gets its own seq number
 data    | text        | Just some text, totally irrelevant.

id + seq is a combined key.

What I'd like to see is:

ID  | SEQ   |                        DATA
----+------ +----------------------------------------------
 1  | 1     | Quick brown fox, lorem ipsum, lazy dog, etc etc.
 1  | 2     | Quick brown fox, lorem ipsum, lazy dog, etc etc.
 1  | 3     | Quick brown fox, lorem ipsum, lazy dog, etc etc.
 1  | 4     | Quick brown fox, lorem ipsum, lazy dog, etc etc.
 2  | 1     | Quick brown fox, lorem ipsum, lazy dog, etc etc.
 3  | 1     | Quick brown fox, lorem ipsum, lazy dog, etc etc.
 3  | 2     | Quick brown fox, lorem ipsum, lazy dog, etc etc.
 3  | 3     | Quick brown fox, lorem ipsum, lazy dog, etc etc.
 3  | 4     | Quick brown fox, lorem ipsum, lazy dog, etc etc.

As you can see, a combination of id and seq is unique.

I'm not sure how to set up my table (or insert statement?) to do this. I'd like to insert id and data, resulting in seq being a sub-sequence dependent on id.

Consultant answered 25/7, 2011 at 20:22 Comment(10)
If seq reflects (or should reflect) the order in which the rows are inserted, I'd rather use a timestamp that gets populated automatically and generate a seq number on the fly when selecting the rows.Dygert
What should happen when a record is deleted? (or updated)Mccrae
If this was my problem, I'd accept horse's comment as the answer.Handpick
I agree with @joop, any deletes could make seq unreliable if it's generated on-the-fly. What problem do you want to solve with this construct? (f.ex. if your only goal is to make id, seq pairs unique, a single sequence will do that -- in fact it'll make seq unique, but that implies id, seq pairs uniqueness)Emlynn
@fthiella, just curious, what is the practical use of such seq column? Depending on its intended use there can be different approaches. One important question here is: is it OK to have gaps in the sequence (due to deleted rows or incomplete rolled back transactions)? If gaps are not OK, then it would be expensive to recalculate the sequence if it is persisted, which means that it may be better to generate it on the fly when needed. If gaps are OK, then single global sequence (standard auto-increment column) is enough.Epochmaking
@VladimirBaranov it's fine for me to have gaps. The pratical use is that I can have multiple tables with multiple sequences, e.g. tbl_2014, tbl_2015, ... but I was curious to now if I could reproduce the same situation with a single table and multiple sequences. Of course, since gaps are fine, I could just use a single seq, you're perfectly right :) but I offered a bounty also for learning purposes :)Causerie
The question is still lacking essential details that cannot be substituted with a bounty. Do you need sequential numbers in seq without gaps? Can there be updates and deletes? What is your current Postgres version? Is the table going to be small or big? How important is performance important? Concurrent access? Concurrent write access? A plain serial column as seq plus a view with row_number() OVER (PARTITION BY id ORDER BY seq) AS seq is probably your best bet: #24919052Aero
@ErwinBrandstetter you're perfectly right, the question is a bit too vague and there can be a lot of different approaches. I don't know about the OP, but I'm fine with gaps and there can be updates and deletes. Once a seq number has been used I don't want to be used again, so maybe the best answer for me would be to.. just use a single seq, as vladimir suggested! But I also like to see what the different approaches can be.Causerie
@fthiella: A plain serial is the way to go.Aero
A few answers are suggesting MAX(seq)+1. This may be problematic for environments with multiple connections. See my comment to answer below.Danndanna
N
49

No problem! We're going to make two tables, things and stuff. stuff will be the table you describe in your question, and things is the one it refers to:

CREATE TABLE things (
    id serial primary key,
    name text
);

CREATE TABLE stuff (
    id integer references things,
    seq integer NOT NULL,
    notes text,
    primary key (id, seq)
);

Then we'll set things up with a trigger that will create a new sequence every time a row is created:

CREATE FUNCTION make_thing_seq() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
begin
  execute format('create sequence thing_seq_%s', NEW.id);
  return NEW;
end
$$;

CREATE TRIGGER make_thing_seq AFTER INSERT ON things FOR EACH ROW EXECUTE PROCEDURE make_thing_seq();

Now we'll end up with thing_seq_1, thing_seq_2, etc, etc...

Now another trigger on stuff so that it uses the right sequence each time:

CREATE FUNCTION fill_in_stuff_seq() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
begin
  NEW.seq := nextval('thing_seq_' || NEW.id);
  RETURN NEW;
end
$$;

CREATE TRIGGER fill_in_stuff_seq BEFORE INSERT ON stuff FOR EACH ROW EXECUTE PROCEDURE fill_in_stuff_seq();

That'll ensure that when rows go into stuff, the id column is used to find the right sequence to call nextval on.

Here's a demonstration:

test=# insert into things (name) values ('Joe');
INSERT 0 1
test=# insert into things (name) values ('Bob');
INSERT 0 1
test=# select * from things;
 id | name
----+------
  1 | Joe
  2 | Bob
(2 rows)

test=# \d
              List of relations
 Schema |     Name      |   Type   |  Owner
--------+---------------+----------+----------
 public | stuff         | table    | jkominek
 public | thing_seq_1   | sequence | jkominek
 public | thing_seq_2   | sequence | jkominek
 public | things        | table    | jkominek
 public | things_id_seq | sequence | jkominek
(5 rows)

test=# insert into stuff (id, notes) values (1, 'Keychain');
INSERT 0 1
test=# insert into stuff (id, notes) values (1, 'Pet goat');
INSERT 0 1
test=# insert into stuff (id, notes) values (2, 'Family photo');
INSERT 0 1
test=# insert into stuff (id, notes) values (1, 'Redundant lawnmower');
INSERT 0 1
test=# select * from stuff;
 id | seq |        notes
----+-----+---------------------
  1 |   1 | Keychain
  1 |   2 | Pet goat
  2 |   1 | Family photo
  1 |   3 | Redundant lawnmower
(4 rows)

test=#
Nickname answered 13/5, 2015 at 2:55 Comment(12)
this looks promising, it looks like what I was looking for, I haven't tried it yet but I'll give you feedback soonCauserie
The function make_thing_seq() will fail for the second insert of the same id value because you are not checking if such a sequence already exists.Dygert
Uhh it is using the id column which is a primary key, and thus unique. Trying to insert the same value of id will fail well before you get to the trigger function.Nickname
things.id is the PK, but nothing keeps me from deleting and re-inserting the same id. UPDATE isn't covered, either. An AFTER trigger is too late in cases where you insert parent and child rows in the same statement. (Trigger on the child table runs BEFORE.) Even if it didn't, a data-modifying CTE manipulates both tables virtually at the same time. There are multiple ways how this can fail. Even while it works, sequences don't guarantee sequential numbers to begin with. Gaps in the numbering are to be expected.Aero
I carefully read all answers and all comments. I understant that this answer is not always the best option and that under some circumstances it might fail, but for learning purposes it was the most useful for me so I decided to award this one. Other answers are good also. I don't know about the OP, but for my specific problem a single seq would be enough, yes someone suggested it but unfortunately nobody posted it as an answer...! Thanks to everyone, all comments and answers have been very useful.Causerie
@ErwinBrandstetter if delete then reinsert the same id, it create a gap between seq column, I think just fine? what problem if I have gap between them?Gil
@truongnm: No problem unless you expect otherwise. My suggestion is a plain serial column where gaps are to be expected.Aero
create sequence IF NOT EXISTS would definitely fix the possible problem of deleting and re-inserting the master ID.Petrina
Well, I See this satisfying as an idea as well as an implmentation, if any one find a gape within this solution! it's still the way to go. hope this for near future build in, withe performance.Moore
I much prefer Erwin's approach using ROW_NUMBER() and a view, where ROW_NUMBER() produces a virtual sequence (with no gaps). Generating a sequence for every unique group is .. a lot of extra database objects.Joanjoana
Excellent approach! If you're using UUIDs you'll need to do this to the ID before putting it into the sequence name translate(NEW.id::text, '-','_')Apparently
If using UUIDs you can also wrap your sequence in double quotes when creating and accessing instead of converting - to _.Dollar
T
27

You could use a window function to assign your SEQ values, something like:

INSERT INTO YourTable
    (ID, SEQ, DATA)
    SELECT ID, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY DATA), DATA
        FROM YourSource
Tearoom answered 25/7, 2011 at 20:27 Comment(3)
Interesting method... (Looking for something like this for ages!) what are the consequences of using partitioning/row number, Is it safe? when might it not work?Quietude
Window functions are very common and you probably aren't going to run into cases "when it might not work". Check out the link in Joe's answer. Once you start using them it will open up a new world of possibilities in your SQL statements.Rogovy
This should fail if you are ever going to delete any but the last record. In this case the row_number() will collide with the last record. In other words this is going to fail in a lot of very common scenarios at some point.Jurkoic
D
7

If seq reflects (or should reflect) the order in which the rows are inserted, I'd rather use a timestamp that gets populated automatically and generate the sequence number on the fly when selecting the rows using row_number():

create table some_table
( 
  id          integer   not null,
  inserted_at timestamp not null default current_timestamp,
  data text
);

The to get the seq column, you can do:

select id,  
       row_number() over (partition by id order by inserted_at) as seq,
       data
from some_table
order by id, seq;

The select is however going to be a bit slower compared to using a persisted seq column (especially with an index on id, seq).

If that becomes a problem you can either look into using a materialized view, or adding the seq column and then updating it on a regular basis (I would not do this in a trigger for performance reasons).

SQLFiddle example: http://sqlfiddle.com/#!15/db69b/1

Dygert answered 13/5, 2015 at 5:29 Comment(4)
maybe it's better to use a sequence instead of a timestamp? can it happen that two rows share the same timestamp? the idea is very simple but good, but if you delete a record the sequence will be calculated again, I don't know about the OP but I prefer to have gapsCauserie
@fthiella: you can't make a sequence dependent on the id column, you would need one sequence for each possible value of id to achieve this (essentially what Jay is suggesting in his answer)Dygert
Since current_timestamp is the start of the transaction, it's identical for all the rows you insert in your sqlfiddle example (and any other single transaction). Is row_number() then just a function the order the rows are read off the disk in? Is that guaranteed to remain stable in operation / across backups and restores?Nickname
You could write conditional 'nextval' in the default, which requires to read the value of the company name in the script.Fishwife
F
1

Just a guess.

INSERT INTO TABLE (ID, SEQ, DATA)
VALUES
(
 IDVALUE,
 (SELECT max(SEQ) +1 FROM TABLE WHERE ID = IDVALUU),
 DATAVALUE
);
Founder answered 12/5, 2015 at 15:1 Comment(3)
That's essentially Joe's answer just not as efficientDygert
Isn't table exclusive lock needed for correct work? What if two inserts like this run concurently?Monkshood
I think these max+1 solutions may prove to be unreliable. To test on Postgresql, I created tbl and inserted one row: id=1. I then opened two connections and started a transaction on each. I executed INSERT INTO tbl SELECT MAX(id)+1 FROM tbl. The first insert completed and the second waited for the first, as expected. I committed the first transaction and the second one immediately output: ERROR: duplicate key value violates unique constraint "tbl_pkey" DETAIL: Key (id)=(2) already exists. I committed the second transaction and it automatically rolled back.Danndanna
L
0

Here's a simple way using standard SQL:

INSERT INTO mytable (id, seq, data)
SELECT << your desired ID >>,
       COUNT(*) + 1,
       'Quick brown fox, lorem ipsum, lazy dog, etc etc.'
FROM mytable
WHERE id = << your desired ID (same as above) >>;

See SQL Fiddle Demo.

(If you wanted to be a bit cleverer you could consider creating a trigger to update the row using the same method immediately after an insert.)

Lilia answered 14/5, 2015 at 15:14 Comment(0)
H
0

I had the same need to dynamicaly store a tree-like structure, not to add all IDs at once.
I prefer not use sequence table for each group as there could be thousands of them.
It run in an intensive multi-processing environment, so it has to be race-condition-proof.
Here the insert fonction for the 1st level. Other levels follow the same principle.

Each group as independent non-reusable sequencial IDs, the function receives a group name & sub-group name, gives you the existing ID or creates it & returns the new ID.
I tried a loop to have a single select, but the code is as long & harder to read.

CREATE OR REPLACE FUNCTION getOrInsert(myGroupName TEXT, mySubGroupName TEXT)
  RETURNS INT AS
$BODY$
DECLARE
   myId INT;
BEGIN -- 1st try to get it if it already exists
   SELECT id INTO myId FROM myTable
      WHERE groupName=myGroupName AND subGroupName=mySubGroupName;
   IF NOT FOUND THEN
      -- Only 1 session can get it but others can read
      LOCK TABLE myTable IN SHARE ROW EXCLUSIVE MODE; 
      -- 2nd try in case of race condition
      SELECT id INTO myId FROM myTable
         WHERE groupName=myGroupName AND subGroupName=mySubGroupName;
      IF NOT FOUND THEN -- Doesn't exist. Get next ID for this group.
         SELECT COALESCE(MAX(id), 0)+1 INTO myId FROM myTable
            WHERE groupName=myGroupName;
         INSERT INTO myTable (groupName, id, subGroupName)
            VALUES (myGroupName, myId, mySubGroupName);
      END IF;
   END IF;
   RETURN myId;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE COST 100;

To try it:

CREATE TABLE myTable (GroupName TEXT, SubGroupName TEXT, id INT);
SELECT getOrInsert('groupA', 'subgroupX'); -- Returns 1
...
SELECT * FROM myTable;
 groupname | subgroupname | id 
-----------+--------------+----
 groupA    | subgroupX    |  1
 groupA    | subgroupY    |  2
 groupA    | subgroupZ    |  3
 groupB    | subgroupY    |  1
Hoosegow answered 10/4, 2019 at 19:59 Comment(0)
J
-3

PostgreSQL supports grouped unique columns, as such:

CREATE TABLE example (
    a integer,
    b integer,
    c integer,
    UNIQUE (a, c)
);

See PostgreSQL Documentation - Section 5.3.3

Easy :-)

Jackelynjackeroo answered 25/7, 2011 at 20:47 Comment(1)
The unique part isn't my main concern, it's getting the data to be input in that way as if it were a sub-sequence.Consultant
R
-4

I don't have any postgresql-specific experience, but can you use a subquery in your insert statement? Something like, in Mysqlish,

INSERT INTO MYTABLE SET 
   ID=4, 
   SEQ=(  SELECT MAX(SEQ)+1 FROM MYTABLE WHERE ID=4  ),
   DATA="Quick brown fox, lorem ipsum, lazy dog, etc etc."
Renaud answered 25/7, 2011 at 20:49 Comment(2)
That syntax is invalid SQL. There is no SET for insertDygert
Actually this syntax is MySQL specific, but the main idea is hereSuwannee

© 2022 - 2024 — McMap. All rights reserved.