Return rows from INSERT with ON CONFLICT without needing to update
Asked Answered
L

1

29

I have a situation where I very frequently need to get a row from a table with a unique constraint, and if none exists then create it and return. For example my table might be:

CREATE TABLE names(
    id SERIAL PRIMARY KEY,
    name TEXT,
    CONSTRAINT names_name_key UNIQUE (name)
);

And it contains:

id | name
 1 | bob 
 2 | alice

Then I'd like to:

 INSERT INTO names(name) VALUES ('bob')
 ON CONFLICT DO NOTHING RETURNING id;

Or perhaps:

 INSERT INTO names(name) VALUES ('bob')
 ON CONFLICT (name) DO NOTHING RETURNING id

and have it return bob's id 1. However, RETURNING only returns either inserted or updated rows. So, in the above example, it wouldn't return anything. In order to have it function as desired I would actually need to:

INSERT INTO names(name) VALUES ('bob') 
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = 'bob'
RETURNING id;

which seems kind of cumbersome. I guess my questions are:

  1. What is the reasoning for not allowing the (my) desired behaviour?

  2. Is there a more elegant way to do this?

Licentious answered 29/10, 2016 at 21:22 Comment(0)
I
37

It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.

WITH ins AS (
   INSERT INTO names(name)
   VALUES ('bob')
   ON     CONFLICT ON CONSTRAINT names_name_key DO UPDATE
   SET    name = NULL
   WHERE  FALSE      -- never executed, but locks the row
   RETURNING id
   )
SELECT id FROM ins
UNION  ALL
SELECT id FROM names
WHERE  name = 'bob'  -- only executed if no INSERT
LIMIT  1;

This way you do not actually write a new row version without need.

I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.

However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.

Proper solution for single-row UPSERT:

General solutions for bulk UPSERT:

Without concurrent write load

If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:

WITH ins AS (
   INSERT INTO names(name)
   VALUES ('bob')
   ON     CONFLICT ON CONSTRAINT names_name_key DO NOTHING  -- no lock needed
   RETURNING id
   )
SELECT id FROM ins
UNION  ALL
SELECT id FROM names
WHERE  name = 'bob'  -- only executed if no INSERT
LIMIT  1;
Inman answered 30/10, 2016 at 1:58 Comment(10)
Thank you for the answer! I see this seems like a 'better' way to do it, but I'm not sure what the practical difference is with the method I described?Licentious
@ira: I added some more explanation above.Inman
@ErwinBrandstetter -- does this always return an id? I tried and it didn't seem to work -- https://mcmap.net/q/28948/-insert-or-select-strategy-to-always-return-a-row/435563 -- perhaps I am doing something wrong?Walleye
Ok ... so the bug with the above is this, I think: SET name = NULL where false will not get a lock if name is already null.Walleye
Not the case, the confusion has been sorted out in https://mcmap.net/q/28948/-insert-or-select-strategy-to-always-return-a-row/435563.Inman
@ErwinBrandstetter Does the "regular" upsert have any race conditions or problems other than PostgeSQL creating a new version of the row behind the scenes? ie: INSERT INTO names(name) VALUES ('bob') ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE SET name = 'bob' RETURNING id;Candlepower
does having LIMIT 1 actually avoid the second SELECT query (from table) if there is already a row in the first SELECT query (from cte)?Freefloating
@CapiEtheriel: For this query, de-facto yes. But see: dba.stackexchange.com/q/316818/3684Inman
also... would wrapping both the INSERT and the SELECT in a single transaction be safe enough for concurrent queries? I see there's a lot of consideration about locks here, I wonder if it wouldn't be simpler to rely on transactions. (maybe slower?)Freefloating
@CapiEtheriel: If concurrency (locking) is a concern, rather consider this related answer like already suggested in my answer. Looping over separate SELECT and INSERT is the ultimate solution.Inman

© 2022 - 2024 — McMap. All rights reserved.