Deadlocks in PostgreSQL when running UPDATE
Asked Answered
D

3

42

I'm a little bit confused reading about PostgreSQL deadlocks.

A typical deadlock example is:

-- Transaction 1
UPDATE customer SET ... WHERE id = 1
UPDATE customer SET ... WHERE id = 2

-- Transaction 2
UPDATE customer SET ... WHERE id = 2
UPDATE customer SET ... WHERE id = 1

But what if I change the code as follows:

-- Transaction 1
UPDATE customer SET ... WHERE id IN (1, 2)

-- Transaction 2
UPDATE customer SET ... WHERE id IN (1, 2)

Will be a possibility of deadlock here?

Essentially my question is: in the 2nd case does PostgreSQL lock rows one-by-one, or lock the entire scope covered by the WHERE condition?

Thanks in advance!

Duala answered 20/4, 2012 at 11:41 Comment(1)
Your "typical deadlock example" block actually helped me to create a DB Deadlock in Postgres in Spring. I tried several different things to create one and this one finally enabled me to create one. Just wanted to see a deadlock in action and was finally able to produce in some test code.Sinusoidal
D
50

In PostgreSQL the rows will be locked as they are updated -- in fact, the way this actually works is that each tuple (version of a row) has a system field called xmin to indicate which transaction made that tuple current (by insert or update) and a system field called xmax to indicate which transaction expired that tuple (by update or delete). When you access data, it checks each tuple to determine whether it is visible to your transaction, by checking your active "snapshot" against these values.

If you are executing an UPDATE and a tuple which matches your search conditions has an xmin which would make it visible to your snapshot and an xmax of an active transaction, it blocks, waiting for that transaction to complete. If the transaction which first updated the tuple rolls back, your transaction wakes up and processes the row; if the first transaction commits, your transaction wakes up and takes action depending on the current transaction isolation level.

Obviously, a deadlock is the result of this happening to rows in different order. There is no row-level lock in RAM which can be obtained for all rows at the same time, but if rows are updated in the same order you can't have the circular locking. Unfortunately, the suggested IN(1, 2) syntax doesn't guarantee that. Different sessions may have different costing factors active, a background "analyze" task may change statistics for the table between the generation of one plan and the other, or it may be using a seqscan and be affected by the PostgreSQL optimization which causes a new seqscan to join one already in progress and "loop around" to reduce disk I/O.

If you do the updates one at a time in the same order, in application code or using a cursor, then you will have only simple blocking, not deadlocks. In general, though, relational databases are prone to serialization failures, and it is best to access them through a framework which will recognize them based on SQLSTATE and automatically retry the entire transaction from the start. In PostgreSQL a serialization failure will always have a SQLSTATE of 40001 or 40P01.

http://www.postgresql.org/docs/current/interactive/mvcc-intro.html

Dirichlet answered 20/4, 2012 at 12:14 Comment(4)
Thanks! So my example above can cause a deadlock (because we don't know the order, in which the rows are processed in both transactions)?Duala
It could cause a deadlock, although that would be rare; as opposed to the first example (explicitly choosing different orders), where it would be common. You could rule out deadlocks by taking a table-level lock of appropriate strength for the duration of every transaction which updates the table, but that cure may be worse than the disease. See the doc section I referenced for details.Dirichlet
But does PostgreSQL release lock after the row has been updated, but the whole UPDATE statement hasn't yet finished? In other words if we have a statement like UPDATE ... WHERE id IN (1,2,3,4,5) after postgresql updates, say, row with id=1 and proceeds with row with id=2, will it release the row id=1? if yes, how how will it roll the rows back if necessary?Duala
Locks are held until commit or rollback.Dirichlet
M
6

does PostgreSQL lock rows one-by-one, or lock the entire scope

PostgreSQL locks rows one-by-one.

And frustratingly, there is no way to order updates (or deletes) like there is for selects and inserts.

The solution is to lock the records upfront with SELECT FOR UPDATE and self-join.

UPDATE customer AS c SET ...
FROM (
  SELECT ctid
  FROM customer
  WHERE id IN (1, 2)
  ORDER BY id -- the optimal ordering varies, but it must be strict and consistent
  FOR UPDATE
) AS c2
WHERE c.ctid = c2.ctid

(Here I use the physical ID of the row ctid which can be slightly faster for the join.)

PostgreSQL will find the records, lock the records in order, and then update the records.

You can examine the query plan to convince yourself of this.

There is some overhead, but it's minimal, especially considering that UPDATE is typically not a light operation in the fist place.

Marquet answered 17/2, 2022 at 18:37 Comment(0)
A
1

Improve the SQL statement based on Paul Draper's solution:

UPDATE customer SET ...
WHERE id IN (
  SELECT id
  FROM customer
  WHERE id IN (1, 2)
  ORDER BY id
  FOR UPDATE
)

I did a test on PostgreSQL v10.1 with a shell script to loop the SQL statement 1000 times as background jobs. Where the original UPDATE command always run into the deadlock situation, both Paul Draper's command and mine were done successfully. There was no significant performance difference when I tested them on a small table with only two rows.

I don't have the knowledge of internal PostgreSQL implementation, which could affect the result if different versions have different optimization methods on the SQL commands.

Based on the Erwin Brandstetter's answer for a similar issue, this should work as well:

BEGIN;
SELECT id FROM customer WHERE id IN (1, 2)
  ORDER BY id
  FOR UPDATE;
UPDATE customer SET ... WHERE id IN (1, 2);
END;
Aksel answered 28/11, 2023 at 2:37 Comment(2)
It doesn't looks like improvement. AFAIK it's back again into the problem described in kgrittn answer.Brightman
@RabbanKeyak Thank for the comment. I have tested it successfully.Aksel

© 2022 - 2024 — McMap. All rights reserved.