Avoiding PostgreSQL deadlocks when performing bulk update and delete operations
Asked Answered
C

1

23

We have a single table which does not have references to any other tables.

┬────────────┬─────────────┬───────────────┬───────────────╮
│id_A(bigint)│id_B(bigint) │val_1(varchar) │val_2(varchar) │
╪════════════╪═════════════╪═══════════════╪═══════════════╡

The primary key of the table is a composite of id_A and id_B.

Reads and writes of this table are highly concurrent and the table has millions of rows. We have several stored procedures which do mass updates and deletes. Those stored procedures are being called concurrently mainly by triggers and application code.

The operations usually look like the following where it could match thousands of records to update/delete:

DELETE FROM table_name 
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)

UPDATE table_name
SET val_1 = 'some value', val_2 = 'some value'
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)

We are experiencing deadlocks and all our attempts to perform operations with locks (row level using SELECT FOR UPDATE and table level locks) do not seem to solve these deadlock issues. (Note that we cannot in any way use access exclusive locking on this table because of the performance impact)

Is there another way that we could try to solve these deadlock situations? The reference manual says:

The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order.

But how could we achieve this in the above scenario. Is there a guaranteed way to do bulk update inset operations in a particular order?

Chart answered 19/11, 2014 at 1:5 Comment(0)
S
36

Use explicit row-level locking in ordered subqueries in all competing queries.
(SELECT does not compete with write-locks.)

DELETE

DELETE FROM table_name t
USING (
   SELECT id_A, id_B
   FROM   table_name 
   WHERE  id_A = ANY(array_of_id_A)
   AND    id_B = ANY(array_of_id_B)
   ORDER  BY id_A, id_B
   FOR    UPDATE
   ) del
WHERE  t.id_A = del.id_A
AND    t.id_B = del.id_B;

UPDATE

UPDATE table_name t
SET    val_1 = 'some value'
     , val_2 = 'some value'
FROM  (
   SELECT id_A, id_B
   FROM   table_name 
   WHERE  id_A = ANY(array_of_id_A)
   AND    id_B = ANY(array_of_id_B)
   ORDER  BY id_A, id_B
   FOR    NO KEY UPDATE  -- Postgres 9.3+
-- FOR    UPDATE         -- for older versions or updates on key columns
   ) upd
WHERE  t.id_A = upd.id_A
AND    t.id_B = upd.id_B;

This way, rows are locked in consistent order as advised in the manual.

Assuming that id_A, id_B are never updated, even rare corner case complications like detailed in the "Caution" box in the manual are not possible.

While not updating key columns, you can use the weaker lock mode FOR NO KEY UPDATE. Requires Postgres 9.3 or later.


The other (slow and sure) option is to use the Serializable Isolation Level for competing transactions. You would have to prepare for serialization failures, in which case you have to retry the command.

Shrine answered 19/11, 2014 at 11:33 Comment(6)
Thank you for the reply @Erwin. I have used your strategy but still getting the deadlocks. I can't see what's obviously wrong with the approach. I'd expect the rows to be locked in order. Any more suggestions to try out will be greatly appreciated.Chart
@sanjayav: Did you analyze the logs to see which relations and queries are actually involved? Maybe you have additional queries you forgot to adapt? Are there triggers involved that access involved tables out of order?Shrine
I've been tearing my hair out on a deadlock my team has been facing, and this was the ticket, thanks a @ErwinBrandstetter!Papoose
it seems to me that key factor is to sort by unique id before update. Assume that thread A updates rows 1, 2, 3 and thread B updates rows 2, 3, 4. Thread A may lock row 1 and row 2. In this case thread B will wait for A. Thread A may lock only row 1. In this case A will wait for thread B.Scudo
Yes, consistent sort order is the cheapest defense against deadlocks.Shrine
Consider a table contact with two different unique indexes 'email' and phone_number, how to use consistent order to prevent the deadlock?. The order by email, phone_number does not solve the issue as we may have for example: transaction1: (email1, phone1), (email2, phone2) transaction2: (email3, phone2), (email4, phone1)Brunner

© 2022 - 2024 — McMap. All rights reserved.