SQL Server Deadlock Fix: Force join order, or automatically retry?
Asked Answered
C

3

8

i have a stored procedure that performs a join of TableB to TableA:

 SELECT <--- Nested <--- TableA
             Loop   <--
                      |
                      ---TableB

At the same time, in a transaction, rows are inserted into TableA, and then into TableB.

This situation is occasionally causing deadlocks, as the stored procedure select grabs rows from TableB, while the insert adds rows to TableA, and then each wants the other to let go of the other table:

INSERT     SELECT
=========  ========
Lock A     Lock B
Insert A   Select B
Want B     Want A
....deadlock...

Logic requires the INSERT to first add rows to A, and then to B, while i personally don't care the order in which SQL Server performs its join - as long as it joins.

The common recommendation for fixing deadlocks is to ensure that everyone accesses resources in the same order. But in this case SQL Server's optimizer is telling me that the opposite order is "better". i can force another join order, and have a worse performing query.

But should i?

Should i override the optimizer, now and forever, with a join order that i want it to use?

Or should i just trap error native error 1205, and resubmit the select statement?

The question isn't how much worse the query might perform when i override the optimizer and for it to do something non-optimal. The question is: is it better to automatically retry, rather than running worse queries?

Catapult answered 4/3, 2010 at 20:13 Comment(0)
C
9

Is it better to automatically retry deadlocks. The reason being that you may fix this deadlock, only to hit another one later. The behavior may change between SQL releases, if the size of the tables changes, if the server hardware specifications change, and even if the load on the server changes. If the deadlock is frequent, you should take active steps to eliminate it (an index is usually the answer), but for rare deadlocks (say every 10 mins or so), retry in the application can mask the deadlock. You can retry reads or writes, since the writes are, of course, surrounded by proper begin transaction/commit transaction to keep all write operations atomic and hence able to retry them w/o problems.

Another avenue to consider is turning on read committed snapshot. When this is enabled, SELECT will simply not take any locks, yet yield consistent reads.

Charlottcharlotta answered 4/3, 2010 at 22:20 Comment(4)
Remus, retrying reads makes perfect sense, but automatically retrying writes after deadlocks leads to lost updates. When you write and become a deadlock victim, it is likely that the data you are going to touch has been modified by someone else. We should not automatically rewrite in such cases. We should re-read possibly modified data, and consider again whether we want to save. Makes sense?Bellda
@AlexKuznetsov: That's absurd, if a transaction is written properly (i.e. atomically) then how could retrying it possibly result in a lost update? I'm giving this +1, it's definitely the right answer. You can't stop every deadlock, it's just part of the background noise with ACID semantics.Tolan
@Alex, @Aaro: You are actually both right. By 'retry' I mean indeed 'read current status, apply changes, write back new status'. For automated processing applications, this is a very easy to achieve pattern. However, for user interactive applications, this can be harder and often the proper action is to push back the 'write' by re-reading the current state and re-displaying it to the user, so he/she can confirm that the applied changes make sense in the new state/context, and I think this is what Alex had in mind. The correct action therefore depends from case to case.Charlottcharlotta
i commented on Alex's comment on the other answer: but i'll repeat. Retrying an update is the same as the user originally clicking the "Save" button 200ms later. If they had just waited a little bit, there would not have been any deadlock.Catapult
B
5

To avoid deadlocks, one of the most common recommendations is "to acquire locks in the same order" or "access objects in the same order". Clearly this makes perfect sense, but is it always feasible? Is it always possible? I keep encountering cases when I cannot follow this advice.

If I store an object in one parent table and one or more child ones, I cannot follow this advice at all. When inserting, I need to insert my parent row first. When deleting, I have to do it in the opposite order.

If I use commands that touch multiple tables or multiple rows in one table, then usually I have no control in which order locks are acquired, (assuming that I am not using hints).

So, in many cases trying to acquire locks in the same order does not prevent all deadlocks. So, we need some kind of handling deadlocks anyway - we cannot assume that we can eliminate them all. Unless, of course, we serialize all access using Service Broker or sp_getapplock.

When we retry after deadlocks, we are very likely to overwrite other processes' changes. We need to be aware that very likely someone else modified the data we intended to modify. Especially if all the readers run under snapshot isolation, then readers cannot be involved in deadlocks, which means that all the parties involved in a deadlock are writers, modified or attempted to modify the same data. If we just catch the exception and automatically retry, we can overwrite someone else's changes.

This is called lost updates, and this is usually wrong. Typically the right thing to do after a deadlock is to retry on a much higher level - re-select the data and decide whether to save in the same way the original decision to save was made.

For example, if a user pushed a Save button and the saving transaction was chosen as a deadlock victim, it might be a good idea to re-display the data on the screen as of after the deadlock.

Bellda answered 5/3, 2010 at 1:32 Comment(7)
+1 this is true in interactive applications: if a write had deadlocked, it is very likely that the state which was being updated had changed, since that is exactly the resource the deadlock occurred on. My answer was influenced by my background in queue processing, where the 'higher level' is contained in the transaction that rolls back.Charlottcharlotta
@AlexKuznetsov: i don't agree so much about the peril of retrying an update. If the user happened to click the button 200ms later, rather than sooner, the effect would have been the same.Catapult
If your application is already designed to support optimistic concurrency, then it makes sense to treat a deadlock as a conflict. If the application would have overwritten changes anyway, then you might as well just retry the update.Tolan
@Ian: If the user happened to click the button 200ms later, rather than sooner, the system might detect a potential lost update and bounce the modification.Bellda
@Alex: You mean my system? Where last person to click Save wins?Catapult
@Ian. I do not know your requirements. Yet in many cases the "last person to click Save wins" approach is not acceptable.Bellda
@Alex. Sometimes locked is wanted. But mostly we behave like everyone thinks it behaves. Like if saved any other file.Catapult
Y
2

Trapping and rerunning can work, but are you sure that the SELECT is always the deadlock victim? If the insert is the deadlock victim, you'll have to be much more careful about retrying.

The easiest solution in this case, I think, is to NOLOCK or READUNCOMMITTED (same thing) your select. People have justifiable concerns about dirty reads, but we've run NOLOCK all over the place for higher concurrency for years and have never had a problem.

I'd also do a little more research into lock semantics. For example, I believe if you set transaction isolation level to snapshot (requires 2005 or later) your problems go away.

Yalu answered 4/3, 2010 at 21:57 Comment(7)
SQL Server rolls back the transaction with the fewest amount of held resources. The "insert" is a transacted series of maybe a dozen inserts. The select is a lone select (wrapped in a stored procedure)Catapult
@Ian Boyd: A single SELECT statement cannot create a deadlock situation. You need to have at least two multi-statement transactions. They don't both need to be DML, but they both must be waiting for locks on each other's resources and that means they both must use at least two resources. If it's really just a single SELECT statement, not wrapped in any larger transaction, then it might not be a real deadlock, it might just be the I/O system struggling to keep up or some other odd server issue.Tolan
@Aaronaught. A single select can cause a deadlock with another process (blogs.msdn.com/bartd/archive/2006/09/25/770928.aspx)Catapult
And in 30 days i recorded 140 deadlock victims. 138 of them were on one select statement. 1 was on another select statement. And 1 was on a transacted multi-statement update. So, to answer your question ("are you sure that the SELECT is always the deadlock victim?"): i'm pretty sure. 99.2857% sure.Catapult
@Ian Boyd: Interesting, I've never experienced that, learned something today. It does seem to require a highly specific combination of circumstances with respect to indexing and query structure. Of course the SELECT statement will always be chosen as the victim because it is seen as the less "important" transaction. I'm actually surprised that you saw even 1 instance of UPDATE being the victim, but then that's the thing about deadlocks, they're extremely unpredictable and you need to be prepared for them somehow.Tolan
@Aaronaught: For the UPDATE that was killed: i cannot say what the other process was doing. Presumably it wasn't a simple select.Catapult
@IanBoyd How get more info about deadlock ? sp_lock sp_who sp_who2 select * from sys.dm_tran_locks ?Rattlesnake

© 2022 - 2024 — McMap. All rights reserved.