I am testing a process that deletes many, many records at once. It cannot TRUNCATE TABLE
, because there are records in there that need to stay.
Because of the volume, I have broken the delete into a loop similar to this:
-- Do not block if records are locked.
SET LOCK_TIMEOUT 0
-- This process should be chosen as a deadlock victim in the case of a deadlock.
SET DEADLOCK_PRIORITY LOW
SET NOCOUNT ON
DECLARE @Count
SET @Count = 1
WHILE @Count > 0
BEGIN TRY
BEGIN TRANSACTION -- added per comment below
DELETE TOP (1000) FROM MyTable WITH (ROWLOCK, READPAST) WHERE MyField = SomeValue
SET @Count == @@ROWCOUNT
COMMIT
END TRY
BEGIN CATCH
exec sp_lock -- added to display the open locks after the timeout
exec sp_who2 -- shows the active processes
IF @@TRANCOUNT > 0
ROLLBACK
RETURN -- ignoring this error for brevity
END CATCH
MyTable is a clustered table. MyField is in the first column in the clustered index. It indicates a logical grouping of records, so MyField = SomeValue
often selects many records. I don't care in which order they are deleted so long as one group is processed at a time. There are no other indexes on this table.
I added the ROWLOCK
hint to try to avoid lock escalations we have seen in production. I added the READPAST
hint to avoid deleting records locked by other processes. That should never happen, but I am trying to be safe.
Problem: sometimes this loop hits a lock timeout 1222 "Lock request time out period exceeded" when it is the only thing running.
I am positive there is no other activity on this system while I am testing this process, because it is my own developer box, nobody else is connected, there are no other processes running on it, and the profiler shows no activity.
I can re-run the same script a second later and it picks up where it left off, happily deleting records-- until the next lock timeout.
I have tried a BEGIN TRY
/ BEGIN CATCH
to ignore the 1222 error and retry the delete, but it fails again immediately with the same lock timeout error. It also fails again if I add a short delay before retrying.
I assume the lock timeouts are because of something like a page split, but I am not sure why this would conflict with the current loop iteration. The prior delete statement should have already completed, and I thought that meant any page splits were also finished.
Why is the DELETE loop hitting a lock timeout against itself?
Is there a way the process can avoid this lock timeout or detect that it is safe to resume?
This is on SQL Server 2005.
-- EDIT --
I added the Lock:Timeout event to the profiler. It's timing out on a PAGELOCK during the delete:
Event Class: Lock:Timeout
TextData: 1:15634 (one example of several)
Mode: 7 - IU
Type: 6 - PAGE
DBCC PAGE reports these pages are outside of the range of the master database (ID 1).
-- EDIT 2 --
I added a BEGIN TRY
/ BEGIN CATCH
and ran an exec sp_lock
in the catch block. Here is what I saw:
spid dbid ObjId IndId Type Resource Mode Status
19 2 1401108082 1 PAG 1:52841 X GRANT (tempdb.dbo.MyTable)
19 2 1401108082 0 TAB IX GRANT (tempdb.dbo.MyTable)
Me 2 1401108082 0 TAB IX GRANT (tempdb.dbo.MyTable)
Me 1 1115151018 0 TAB IS GRANT (master..spt_values) (?)
SPID 19 is a SQL Server TASK MANAGER. Why would one of these task managers be acquiring locks on MyTable?