Why write skew can happen in Repeatable reads?
Asked Answered
S

1

25

Wiki says;

Repeatable read:
In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data) until the end of the transaction. However, range-locks are not managed, so phantom reads can occur.

Write skew is possible at this isolation level, a phenomenon where two writes are allowed to the same column(s) in a table by two different writers (who have previously read the columns they are updating), resulting in the column having data that is a mix of the two transactions.

I'm curious about why write skew can happen in Repeatable reads? It says that it will keep read and write locks until the end of the transaction and the write skew happens when previously read the columns they are updating, so how can lock a write lock when a read lock is locked?

Syzygy answered 24/1, 2018 at 8:12 Comment(0)
A
43

Repeatable read isolation level guarantees that each transaction will read from the consistent snapshot of the database. In other words, a row is retrieved twice within the same transaction always has the same values.

Many databases such as Postgres, SQLServer in repeatable read isolation levels can detect lost update (a special case of write skew) but others don't. (i.e: InnoDB engine in MySQL)

We're back to write skew phenomena problem. There are situations that most database engines cannot detect in the repeatable read isolation. One case is when 2 concurrent transactions modifies 2 different objects and making race conditions.

I take an example from the book Designing Data-Intensive Application. Here is the scenario:

You are writing an application for doctors to manage their on-call shifts at a hospital. The hospital usually tries to have several doctors on call at any one time, but it absolutely must have at least one doctor on call. Doctors can give up their shifts (e.g., if they are sick themselves), provided that at least one colleague remains on call in that shift

The next interesting question is how we can implement this under databases. Here is pseudocode SQL code:

BEGIN TRANSACTION;
    SELECT * FROM doctors
        WHERE on_call = true
        AND shift_id = 1234;
    if (current_on_call >= 2) {
        UPDATE doctors
        SET on_call = false WHERE name = 'Alice' AND shift_id = 1234;
    }
COMMIT;  

Here is the illustration: Flow Data

As the above illustration, we see that Bob and Alice run above SQL code concurrently. However Bob and Alice modify different data, Bob modified Bob's record and Alice modified Alice's record. Databases at repeatable-read isolation level no way can know and check the condition (total doctor >= 2) has been violated. Write skew phenomena has happened.

To solve this problem, there are 2 methods proposed:

  1. locks all records that are being called manually. So either Bob or Alice will wait until other finishes transaction.

Here is some pseudocode using SELECT .. FOR UPDATE query.

BEGIN TRANSACTION;
    SELECT * FROM doctors
        WHERE on_call = true
        AND shift_id = 1234 FOR UPDATE; // important here: locks all records that satisfied requirements.

    if (current_on_call >= 2) {
        UPDATE doctors
        SET on_call = false WHERE name = 'Alice' AND shift_id = 1234;
    }
  COMMIT;  
  1. Using a more strict isolation level. Both MySQL, Postgres T-SQL provides serialize isolation level.
Aboral answered 9/9, 2018 at 15:14 Comment(8)
Thanks for a very clear explanation. I have one remark. You said that MySQL Serializeable IL might prevent the Write Skew phenomena by default, right ?The MySQL manual states that in Serializeable IL InnoDB implicitly converts all plain SELECT statements to SELECT ... FOR SHARE. I think that means that even with Serializeable IL in MySQL, pure 2PL shared lock can't prevent the write skew in the described scenario as both transactions can still read the rows and evaluate the if statement in parallelBaptism
Continuing my previous comment: actually you wrote everything right. In case of Serializable isolation level in MySQL, assuming the same interleaving of transaction statements as in your example, both transactions will acquire S lock on every doctor record, then both will evaluate current_on_call >= 2 statement resulting to true, then both transactions will be blocked as for executing Update statement they need X lock but they can't acquire it as the other transaction holds S locks on all doctor records. Result - Deadlock. One transaction will be rolled back- Write Skew preventedBaptism
I have written a blog to test isolation level in MariaDB/MySQL. medium.com/@huynhquangthao/… The important part in this test in the test for Repeatable Read and Serializable. After testing with MySQL, I see that: - Repeatable Read isolation can detect phantom read. But you can still update on record created by different transaction. - Serializable isolation can prevent this behavior: different transaction cannot modify data that might be change the result of other transaction.Aboral
that explanation seems strange - isn't repeatable read isolation supposed to give the same result to a select that was executed in a transaction? The value of "current_on_call " shouldn't change then? My basic testing with SQL Server shows that the update is locked in repeatable read.Pop
I think I figured that out - it depends on implementation details of Repeatable read isolation. Postgres for example is based on MVCC, so it indeed can't prevent the described write skew, but SQL Server is based on locking (2PL), so it does prevent it,as far as I can see.Pop
"a row is retrieved twice within the same transaction always has similar values." - similar? Not the same? (assuming there was no UPDATE within the transaction - but then it might not be similar at all)Grenadine
@MartinThoma Thanks for your suggestion. Using "same" here is more suitable.Aboral
thanks, may I ask what tool is your picture drew by?Hollishollister

© 2022 - 2024 — McMap. All rights reserved.