Postgres deadlocks on concurrent upserts
Asked Answered
L

2

9

We have an application which reads from a data stream and upserts that information into a database. The data is changes which occur on Google Drive which means that many events which impact the same objects can occur very close to each other.

We're running into deadlocks when upserting this information into the database, here is what comes out in the log. I have reconstructed and sanitised the query for readability:

ERROR:  deadlock detected
DETAIL:  Process 10586 waits for ShareLock on transaction 166892743; blocked by process 10597.
  Process 10597 waits for ShareLock on transaction 166892741; blocked by process 10586.
  Process 10586: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

  Process 10597: 
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.167Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.167Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

HINT:  See server log for query details.
CONTEXT:  while locking tuple (3908269,11) in relation "documents"
STATEMENT:  
          INSERT INTO documents
              (version, source, source_id, ingestion_date)
          VALUES
              (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
              (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')

          ON CONFLICT (source, source_id)
          DO UPDATE
          SET
              ingestion_date = EXCLUDED.ingestion_date,
              version = documents.version + 1

          RETURNING source_id, source, uid

The schema:

    Column      |            Type             |                             Modifiers
----------------+-----------------------------+-------------------------------------------------------------------
uid             | uuid                        | not null default gen_random_uuid()
date_created    | timestamp without time zone | not null default now()
sequence_id     | bigint                      | not null default nextval('documents__sequence_id__seq'::regclass)
version         | integer                     | not null default 0
source          | text                        | not null
source_id       | text                        | not null
ingestion_date  | timestamp without time zone | not null

Indexes:
    "documents__pkey" PRIMARY KEY, btree (uid)
    "documents__sequence_id__unique" UNIQUE CONSTRAINT, btree (sequence_id)
    "documents__source__source_id__deleted" UNIQUE, btree (source, source_id)
    "documents__ingestion_date__idx" btree (ingestion_date)
    "documents__source_id__source__idx" btree (source_id, source)

I suspect the problem is something like "the first transaction was locking rows with source_id alpha, beta, gamma in sequence, meanwhile the second transaction was locking rows with source_id delta, gamma in the reverse order, and the deadlock happened at the point they both locked gamma and delta", however the timing here is very tight!

What would the solution for this be? Sorting our values list by source_id?

Largehearted answered 22/9, 2017 at 13:45 Comment(2)
Rob Hasswel, were you able to solve the case?Jenness
Can you reorder the order of the second insert, so that gamma comes before delta? Because it looks a lot like the situation of number 7 in the following link: dzone.com/articles/…Wien
W
6

I can think of three solutions:

  1. You insert only one row per statement, but that's inefficient.

  2. You sort the rows before inserting them.

  3. You retry a transaction if it gets a deadlock or serialization error.

I'd prefer the third solution unless the errors happen very often.

Wonderment answered 22/9, 2017 at 14:37 Comment(1)
2nd one does not work in case of upserts. We've been facing this issue for a long time.Tonicity
A
3

Your query's syntax allows ordering the values easily:

INSERT INTO documents
          (version, source, source_id, ingestion_date)
   SELECT * FROM (
      VALUES
          (0, 'googledrive', 'alpha', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'beta', '2017-09-21T07:03:51.074Z')
          (0, 'googledrive', 'gamma', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'delta', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'epsilon', '2017-09-21T07:03:51.074Z'),
          (0, 'googledrive', 'zeta', '2017-09-21T07:03:51.074Z')
      ) AS v ORDER BY source, source_id

      ON CONFLICT (source, source_id)

This should solve your problem. Performance should be nice, as the sort will be tiny.

Abbevillian answered 22/9, 2017 at 16:33 Comment(2)
Ordering values list in case of upserts does not work. We still have deadlocks even after rows were sorted at the application level.Tonicity
@Tonicity were you able to solve the case?Jenness

© 2022 - 2024 — McMap. All rights reserved.