Postgres 'if not exists' fails because the sequence exists
Asked Answered
C

2

6

I have several counters in an application I am building, as am trying to get them to be dynamically created by the application as required.

For a simplistic example, if someone types a word into a script it should return the number of times that word has been entered previously. Here is an example of sql that may be executed if they typed the word example.

CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
SELECT nextval('example')

This would return 1 the first time it ran, 2 the second time, etc.

The problem is when 2 people click the button at the same time. First, please note that a lot more is happening in my application than just these statements, so the chances of them overlapping is much more significant than it would be if this was all that was happening.

1> BEGIN;
2> BEGIN;
1> CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
2> CREATE SEQUENCE IF NOT EXISTS example START WITH 1; -- is blocked by previous statement
1> SELECT nextval('example')  -- returns 1 to user.
1> COMMIT;  -- unblocks second connection
2> ERROR:  duplicate key value violates unique constraint 
   "pg_type_typname_nsp_index"
   DETAIL:  Key (typname, typnamespace)=(example, 109649) already exists. 

I was under the impression that by using "IF NOT EXISTS", the statement should just be a no-op if it does exist, but it seems to have this race condition where that is not the case. I say race condition because if these two are not executed at the same time, it works as one would expect.

I have noticed that IF NOT EXISTS is fairly new to postgres, so maybe they haven't worked out all of the kinks yet?

EDIT: The main reason we were considering doing things this way was to avoid excess locking. The thought being that if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other (except, as in this example, for the initial creation of that sequence)

Committee answered 9/9, 2016 at 0:49 Comment(7)
I've never seen an application that makes DDL changes dynamically. Can you explain your overall architecture a bit more? Why do you need to create these sequences dynamically?Woodley
I probably don't - we were just looking at our different options for handling this scenario. A table of sequences approach may be what we go with, but I just wanted to consider some different approaches before choosing the most appropriate. Regardless of how we go, this still seems like erroneous behaviour to me...Committee
Why would you re-create the sequence each time a user clicks a button? That does not make sense. The DDL will make the users wait much more. And I don't think that with concurrent access to a sequence the SQL statement has to wait. After all that's what a sequence is for. They are non-transactional and don't need look synchronisation (the creation does though). Why don't you create the sequence with a cache value to increase performance with concurrent accessDemaggio
"if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other" - that is a wrong assumption.Demaggio
@a_horse_with_no_name - no I would not recreate it. The idea behind IF NOT EXISTS would be that nothing would happen if it existed. Also I just tested my theory about whether users had to wait for each other - and I have confirmed that both were able to get a sequence number even though both had the transaction still open - so it actually isn't a wrong assumption.Committee
If you don't want to "re-create" it then why do you run a CREATE SEQUENCE in the first place triggered by a user action? Why not create it once with the rest of the tables and be done with it?Demaggio
Because, as stated in the question, it's dynamic. I don't know what word a user is about to use in the script in the example - and the same is true for the more specific case that this is meant to represent.Committee
T
5

Sequences are part of the database schema. If you find yourself modifying the schema dynamically based on the data stored in the database, you are probably doing something wrong. This is especially true for sequences, which have special properties e.g. regarding their behavior with respect to transactions. Specifically, if you increment a sequence (with the help of nextval) in the middle of a transaction and then you rollback that transaction, the value of the sequence will not be rolled back. So most likely, this kind of behavior is something that you don't want with your data. In your example, imagine that a user tries to add word. This results in the corresponding sequence being incremented. Now imagine that the transaction does not complete for reason (e.g. maybe the computer crashes) and it gets rolled back. You would end up with the word not being added to the database but with the sequence being incremented.

For the particular example that you mentioned, there is an easy solution; create an ordinary table to store all the "sequences". Something like that would do it:

CREATE TABLE word_frequency (
    word text NOT NULL UNIQUE,
    frequency integer NOT NULL
);

Now I understand that this is just an example, but if this approach doesn't work for your actual use case, let us know and we can adjust it to your needs.

Edit: Here's how you the above solution works. If a new word is added, run the following query ("UPSERT" syntax in postgres 9.5+ only):

INSERT INTO word_frequency(word,frequency)
VALUES ('foo',1)
ON CONFLICT (word)
DO UPDATE
SET frequency = word_frequency.frequency + excluded.frequency
RETURNING frequency;

This query will insert a new word in word_frequency with frequency 1, or if the word exists already it will increment the existing frequency by 1. Now what happens if two transaction try to do that at the same time? Consider the following scenario:

client 1          client 2
--------          --------
BEGIN
                  BEGIN
UPSERT ('foo',1)
                  UPSERT ('foo',1) <====
COMMIT
                  COMMIT

What will happen is that as soon as client 2 tries increment the frequency for foo (marked with the arrow above), that operation will block because the row was modified by a different transaction. When client 1 commits, client 2 will get unblocked and continue without any errors. This is exactly how we wanted it to work. Also note, that postgresql will use row-level locking to implement this behavior, so other insertions will not be blocked.

Tien answered 9/9, 2016 at 1:3 Comment(6)
I'm open to different options - and this is one that I have considered. Can you please add to your answer how the workflow I described works using tables? Specifically the scenario where a new word is being added by two separate transactions at the same time? In your example there are no constraints that forbid there being two entries of the same word so a program ran as above would result in 2 rows being added for the same word. Even with a unique constraint it results in a failure rather than a graceful incrementation...Committee
Postgres recently added an UPSERT-like functionality called ON CONFLICT. You can use this to build an atomic create-or-increment operation.Vizzone
Sounds like a potential contender. Would you mind adding an answer showing how this would work with two transactions in a similar way I have done above?Committee
@shadow I've updated my answer with some more details. I've also added the UNIQUE I forgot earlier.Tien
Looks really good. The only thing is, this construct doesn't actually return what the sequence is up to. Is doing a SELECT just after safe?Committee
@shadow No, a SELECT after the upsert might return inaccurate results. It's better to use a RETURNING in the query itself, I've updated my answer.Tien
W
2

EDIT: The main reason we were considering doing things this way was to avoid excess locking. The thought being that if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other (except, as in this example, for the initial creation of that sequence)

It sounds like you're optimizing for a problem that likely does not exist. Sure, if you have 100,000 simultaneous users that are only inserting rows (since a sequence will only be used then normally) there is the possibility of some contention with the sequence but realistically there will be other bottle necks long before the sequence gets in the way.

I'd advise you to first prove that the sequence is an issue. With a proper database design (which dynamic DDL is not) the sequence will not be the bottle neck.

As a reference, DDL is not transaction safe in most databases.

Woodley answered 9/9, 2016 at 4:0 Comment(1)
We have proven it to be an issue time and again sadly. Our automated tests periodically hit it, and even in production we've seen it come up a few times. I can assure you that we wouldn't be looking into this if it wasn't coming up. As per the example, I'm not talking about inserting rows, but having incrementers that get created when required. It's for the generation of unique incrementing barcodes (or a fragment of one anyway) if you require a real use case. Also this question is postgres specific, we don't need to worry about it being agnostic. If we change dbs we'll change approaches.Committee

© 2022 - 2024 — McMap. All rights reserved.