Why does inserting a row with a foreign key referencing a row by pk modified in another snapshot isolation transaction cause the transaction to hang?
Asked Answered
N

1

7

I ran into an interesting problem in a system where due to a schema change, a first database transaction in a single thread blocks a second database transaction from completing, until a timeout occurs.

To test this I created a test database:

CREATE DATABASE StackOverflow
GO

USE StackOverflow

ALTER DATABASE StackOverflow SET ALLOW_SNAPSHOT_ISOLATION ON
ALTER DATABASE StackOverflow SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE
GO

CREATE TABLE One (
    Id int CONSTRAINT pkOne PRIMARY KEY,
    A varchar(10) NOT NULL
)

CREATE TABLE Two (
    Id int CONSTRAINT pkTwo PRIMARY KEY,
    B varchar(10) NOT NULL,
    OneId int NOT NULL CONSTRAINT fkTwoToOne REFERENCES One
)
GO

-----------------------------------------------

CREATE TABLE Three (
    Id int CONSTRAINT pkThree PRIMARY KEY,
    SurrogateId int NOT NULL CONSTRAINT ThreeSurrUnique UNIQUE,
    C varchar(10) NOT NULL
)
GO

CREATE TABLE Four (
    Id int CONSTRAINT pkFour PRIMARY KEY,
    D varchar(10) NOT NULL,
    ThreeSurrogateId int NOT NULL CONSTRAINT fkFourToThree REFERENCES Three(SurrogateId)
)
GO

--Seed data
INSERT INTO One (Id, A) VALUES (1, '')
INSERT INTO Three (Id, SurrogateId, C) VALUES (3, 50, '')

In this first test, a transaction modifying a row in table One is started, but not yet committed. Another transaction is inserting into table Two, with a column referencing the same row being modified in the first transaction in table One. The second transaction will hang forever until the first transaction is committed.

SQL Management Studio testing transaction hang

The reason the transaction waits is due to a LCK_M_S keylock held by the first transaction.

SQL Management Studio Activity Monitor LCK_M_S keylock

In my second test, a transaction modifying a row in table Three is started, but not yet committed, just as in the first test. Another transaction is inserting into table Four, with a column referencing the same row being modified in the first transaction in table Three. Except this time, table Four references a surrogate key in table Three instead of the primary key. The transaction completed immediately and is unaffected by the first transaction.

SQL Management Studio testing transaction hang without issue

I need help understanding why the the latter transaction is always blocked by the former transaction when inserting a row in a separate table that references a table that was modified in the first transaction. I think the obvious unhelpful answer is because of the foreign key constraint. But why? Especially because this is snapshot isolation, why does the latter transaction care about the former at all? The row it's referencing already exists and the foreign key can easily be verified, as proven by the second test where a foreign key referencing a surrogate key completes without obstruction.

Nebulous answered 9/11, 2018 at 15:37 Comment(0)
H
6

The answer is rather simple.

When a query reads to validate Foreign Key constraints they always use locks, never row versioning. Imagine if one transaction is changing a PK value, and a concurrent session inserted a row referencing the old PK value. It isn't allowed to validate the FK constraint based on the consistent version of the row in the version store. If it were, then all the FK's would have to be validated again when the PK change was committed.

In the first case the update transaction has a key lock on the target index of the FK, so the concurrent session can't read the PK values.

In the second, the update doesn't affect the unique key involved in the FK. The update is able to place a Shared lock on the target key value, since the updating session has an exclusive key lock on a key in a different unique index.

In the first example after the first transaction commits, the second one fails with a snapshot isolation update conflict:

Msg 3960, Level 16, State 2, Line 10 Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table 'dbo.One' directly or indirectly in database 'StackOverflow' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement.

This is because in SNAPSHOT isolation you cannot read a row that has changed since your transaction started. And since the FK validation can't use row versions, it needs to read the PK from a row that was updated after its transaction started. This is a violation of SNAPSHOT isolation as that PK value might not have existed at the beginning of the SNAPSHOT transaction.

This can be a little tricky to see, as the SNAPSHOT transaction doesn't really start at the point-in-time when you run BEGIN TRANSACTION (kind of like IMPLICIT TRANSACTIONS) the relevant point-in-time is the one when the transaction first reads or changes the database. EG

if @@trancount > 0 rollback
go
set transaction isolation level snapshot
begin transaction

drop table if exists t
create table t(id int)

--in another session run
--update one set a = a+'b' where id = 1

waitfor delay '0:0:10'

insert into two(id,b,oneid) values (2,'',1) -- fails
Husserl answered 9/11, 2018 at 15:55 Comment(9)
Why does SQL Server need to acquire a key lock in the first case? In both cases, the key being referenced is entirely unmodified.Nebulous
Because modifying a row in a table with a clustered index, requires an X key lock on the clustered index key. If [one] was a heap with a non-clustered PK then it wouldn't block, as the unique index would be in a different data structure. Kind of like your second case.Husserl
@DavidBrowne-Microsoft "if it were, then all FKs would have to be validated again when the PK change was committed" is exactly what I would expect to be happening. I'd expect when it went to commit, it would realize the problem, throw an update-conflict error and roll back its change, restoring things to a consistent state. If I do accept, as unintuitive as that is to me, that things are happening as you say and are the reason for the problem, then I do not understand at all how this problem magically goes away if the PK happens to not be the clustered index.Floccule
Checking constraints on commit, sometimes called "deferred constraint checking" only happens in SQL Server with In-Memory Tables, which don't use locks at all. And as for the non-clustered index, here, there's just more "lock granularity". Since the a row's key and the row's data row are on separate data structures, protected by separate locks, SQL Server can tell that the update doesn't effect the key, and so needn't block the insert. With a Clustered Index the Row and the PK Key are protected by a single lock, so any change to the PK or non-PK column requires an exclusive Key Lock.Husserl
@DavidBrowne-Microsoft Last question: In the first case, why does the second transaction fail after the first transaction is committed? The PK wasn't updated: there wasn't actually a conflict.Nebulous
The first transaction is simply blocked until the first transaction commits or rolls back. At that point the second transaction can check the PK value, and will continue normally if the target PK row wasn't deleted (or its PK value updated).Husserl
But in the first gif we can see that the transaction aborts due to a conflict. The target PK row wasn't deleted and PK value was not updated.Nebulous
see update for an explanation of the SNAPSHOT update conflict.Husserl
More information can be found here: sqlperformance.com/2014/06/sql-performance/…Nebulous

© 2022 - 2024 — McMap. All rights reserved.