In Postgres, is there a need to lock a row in a table using FOR UPDATE if the query is contained in a transaction with serializable isolation level? In other words, does serializable transaction lock the rows by itself until the transaction is committed?
Good question. The answer is that you don't need to use SELECT ... FOR NO KEY UPDATE
to avoid anomalies (effects of race conditions) on the SERIALIZABLE
isolation level, but that it can still be useful to lock rows explicitly to avoid serialization errors.
The design principle of SERIALIZABLE
is that you don't have to worry about what concurrent transactions do: you are guaranteed that if your transaction succeeds, the result will be safe from race conditions. To achieve that, the database can abort your transaction with a serialization error if it cannot think of any other way to keep that promise. If that happens, you have to retry the transaction until it succeeds.
To provide a concrete example: if a concurrent transaction modifies a table row between the time your transaction reads the row and the time it tries to update the row, you will receive a serialization error. If that is a frequent occurrence, it can be quite annoying, because it means that you will have to retry the transaction frequently. This is bad for your overall performance.
In such situations, it can be useful to explicitly lock the row against concurrent modifications using SELECT ... FOR NO KEY UPDATE
(yes, that is the correct lock if you intend to UPDATE
; SELECT ... FOR UPDATE
is only appropriate if you intend to DELETE
the row or to update a column with a unique constraint). Such pessimistic locking will reduce concurrency, but can avoid the serialization error, so it can be a good choice for highly contended data.
Did you mean to write "transaction is committed" rather than "transaction is completed"?
Please be aware that isolation levels are enforced, and anomalies avoided, only when committing transactions. By default, auto-commit is ON, which means that a COMMIT command is run automatically after you execute a SQL command/transaction or set of transactions. If this is OFF, then all of your transactions must be batched with a COMMIT at the end of each, in order for the transaction level to function as documented by Postgres. The commit basically puts all transactions in a system/Postgres table in sequence before actually being run.
Also be aware that isolation levels exist simply to allow transactions to be executed in parallel (at the same time) by different sessions/users. So this is when locks are necessary. If you build your application code to call database code to run things in sequence/serialized, then isolation levels are not even needed. They call the isolation level serializable, so that everything is in the actual sequence/serial of when it actually executed, by any session, based on the system timestamp. According to @Laurenz Albe, there are other factors that determine execution besides commit timestamp. If you have a web application for example, then multiple sessions will exist, and sometimes even with the same user, while changing data "almost" simultaneously (since nothing is actually in parallel in the database world), then isolation levels are critical.
Database locks are handled by Postgres correctly (theoretically if source code is 100% perfect) based on the isolation level. Serializable is the most strict, so the thousands of different anomaly scenarios that are possible (of the four categories), should be avoided, and locks handled properly. But we all know that software is rarely perfect.
https://www.postgresql.org/docs/current/transaction-iso.html
It is too difficult to answer whether FOR UPDATE will be handled correctly without testing. Take a look at this article.
http://rhaas.blogspot.com/2011/01/locking-in-postgresql.html
As an alternative to using isolation levels, you can implement your own locks using explicit locking.
https://www.postgresql.org/docs/current/explicit-locking.html
Scenarios that often cause issues with serializable are when primary and foreign keys are not implemented. Or when indexes are partially committed for data in a table, because the dataset is so large and an index is still building. So referential integrity is a must if dealing with serializable. In the past, I had dealt with a hierarchy of over 200K records and locks were needed in the middle of a hierarchy (soft coded not hard coded--meaning the parent and child ids are in the same table). This is when locks are NOT handled properly in Postgres, because the isolation level depends on the structure Postgres designed, not designs commonly used in the real world. So in that case, FOR UPDATE will not lock properly, not because Postgres "behind the scenes" code is wrong, but because the meaning of a primary/foreign key relationship of a soft coded hierarchy table does not conform to recommended practices by Postgres.
What I'd recommend is reading through and implementing best practices based on the PostgreSQL documentation and especially your database backup procedures. Also test actual database restores from a backup and test the referential integrity of the database after this is done. This will likely be the culprit of future issues with this isolation level, rather than your SQL code itself.
So yes, serializable isolation "should" handle the row level lock, under these contingencies.
© 2022 - 2024 — McMap. All rights reserved.