Inserting large number of records without locking the table
Asked Answered
D

4

10

I am trying to insert 1,500,000 records into a table. Am facing table lock issues during the insertion. So I came up with the below batch insert.

DECLARE @BatchSize INT = 50000

WHILE 1 = 1
  BEGIN
      INSERT INTO [dbo].[Destination] 
                  (proj_details_sid,
                   period_sid,
                   sales,
                   units)
      SELECT TOP(@BatchSize) s.proj_details_sid,
                             s.period_sid,
                             s.sales,
                             s.units
      FROM   [dbo].[SOURCE] s
      WHERE  NOT EXISTS (SELECT 1
                         FROM   dbo.Destination d
                         WHERE  d.proj_details_sid = s.proj_details_sid
                                AND d.period_sid = s.period_sid)

      IF @@ROWCOUNT < @BatchSize
        BREAK
  END 

I have a clustered Index on Destination table (proj_details_sid ,period_sid ). NOT EXISTS part is just to restrict inserted records from again inserting into the table

Am I doing it right, will this avoid table lock ? or is there any better way.

Note : Time taken is more or less same with batch and without batch insert

Deoxygenate answered 9/9, 2016 at 6:9 Comment(13)
add with(nolock) inside exists clause to avoid wait for lockPrimus
@AksheyBhat Adding NOLOCK will cause any problem here due to dirty readsArsphenamine
@AksheyBhat Just curious, the SQL will insert data into Destination table, which nolock will not take effect in the exists clause, isn't it?Revive
@Alex with(nolock) is a table hint which specified that query shouldn't wait for a lock on the table before executing. The SQL will take into account all the committed and uncommitted data of the table.Primus
Lock escalation will be attempted after 5000 locks are taken out. I don't know why the answers so far are focussing on the SELECT part. It is the locks taken out by the newly inserted rows in the batch that are likely to cause any lock escalation issues not the SELECT part. At default isolation level the SELECT will release the row lock as soon as it is read anyway.Rubbery
@MartinSmith - Sorry I didnt understand fully what you meant. Can you please elaborate on it.Arsphenamine
This question similar to your question: #2857820Revive
@AksheyBhat, by msdn.microsoft.com/en-us/library/ms187373.aspx, it state READUNCOMMITTED and NOLOCK cannot be specified for tables modified by insert, update, or delete operations, so it will not apply the table in exists clause?Revive
@Alex doesnt that apply to only update and delete statements? there is a select statement only in exists clausePrimus
RE: elaboration - if your newly inserted rows are being protected by row locks then you will want to make the batch size 5,000 not 50,000 to prevent lock escalation.Rubbery
@AksheyBhat From that page, full sentence should be READUNCOMMITTED and NOLOCK cannot be specified for tables modified by insert, update, or delete operations. The SQL Server query optimizer ignores the READUNCOMMITTED and NOLOCK hints in the FROM clause that apply to the target table of an UPDATE or DELETE statement.. And the exists clause should it be part of from clause?Revive
If your clustered index is Unique, your could get rid of EXISTS, and use WITH IGNORE_DUP_KEY instead, right ? I imagine that it would be more efficient as well. See https://mcmap.net/q/166438/-avoid-duplicates-in-insert-into-select-query-in-sql-serverGamble
@MartinSmith - When I reduce the Batch size the insert become really slowArsphenamine
R
9

Lock escalation is not likely to be related to the SELECT part of your statement at all.

It is a natural consequence of inserting a large number of rows

Lock escalation is triggered when lock escalation is not disabled on the table by using the ALTER TABLE SET LOCK_ESCALATION option, and when either of the following conditions exists:

  • A single Transact-SQL statement acquires at least 5,000 locks on a single nonpartitioned table or index.
  • A single Transact-SQL statement acquires at least 5,000 locks on a single partition of a partitioned table and the ALTER TABLE SET LOCK_ESCALATION option is set to AUTO.
  • The number of locks in an instance of the Database Engine exceeds memory or configuration thresholds.

If locks cannot be escalated because of lock conflicts, the Database Engine periodically triggers lock escalation at every 1,250 new locks acquired.

You can easily see this for yourself by tracing the lock escalation event in Profiler or simply trying the below with different batch sizes. For me TOP (6228) shows 6250 locks held but TOP (6229) it suddenly plummets to 1 as lock escalation kicks in. The exact numbers may vary (dependant on database settings and resources currently available). Use trial and error to find the threshold where lock escalation appears for you.

CREATE TABLE [dbo].[Destination]
  (
     proj_details_sid INT,
     period_sid       INT,
     sales            INT,
     units            INT
  )

BEGIN TRAN --So locks are held for us to count in the next statement
INSERT INTO [dbo].[Destination]
SELECT TOP (6229) 1,
                  1,
                  1,
                  1
FROM   master..spt_values v1,
       master..spt_values v2

SELECT COUNT(*)
FROM   sys.dm_tran_locks
WHERE  request_session_id = @@SPID;

COMMIT

DROP TABLE [dbo].[Destination] 

You are inserting 50,000 rows so almost certainly lock escalation will be attempted.

The article How to resolve blocking problems that are caused by lock escalation in SQL Server is quite old but a lot of the suggestions are still valid.

  1. Break up large batch operations into several smaller operations (i.e. use a smaller batch size)
  2. Lock escalation cannot occur if a different SPID is currently holding an incompatible table lock - The example they give is a different session executing

BEGIN TRAN
SELECT * FROM mytable (UPDLOCK, HOLDLOCK) WHERE 1=0
WAITFOR DELAY '1:00:00'
COMMIT TRAN 
  1. Disable lock escalation by enabling trace flag 1211 - However this is a global setting and can cause severe issues. There is a newer option 1224 that is less problematic but this is still global.

Another option would be to ALTER TABLE blah SET (LOCK_ESCALATION = DISABLE) but this is still not very targeted as it affects all queries against the table not just your single scenario here.

So I would opt for option 1 or possibly option 2 and discount the others.

Rubbery answered 10/9, 2016 at 18:53 Comment(0)
R
7

Instead of checking the data exists in Destination, it seems better to store all data in temp table first, and batch insert into Destination

Reference: Using ROWLOCK in an INSERT statement (SQL Server)

DECLARE @batch int = 100
DECLARE @curRecord int = 1
DECLARE @maxRecord int

-- remove (nolock) if you don't want to have dirty read
SELECT row_number over (order by s.proj_details_sid, s.period_sid) as rownum,
       s.proj_details_sid,
       s.period_sid,
       s.sales,
       s.units
INTO #Temp
FROM   [dbo].[SOURCE] s WITH (NOLOCK)
WHERE  NOT EXISTS (SELECT 1
                   FROM   dbo.Destination d WITH (NOLOCK)
                   WHERE  d.proj_details_sid = s.proj_details_sid
                          AND d.period_sid = s.period_sid)

-- change this maxRecord if you want to limit the records to insert
SELECT @maxRecord = count(1) from #Temp

WHILE @maxRecord >= @curRecord
   BEGIN
       INSERT INTO [dbo].[Destination] 
              (proj_details_sid,
               period_sid,
               sales,
               units)
       SELECT proj_details_sid, period_sid, sales, units
       FROM #Temp
       WHERE rownum >= @curRecord and rownum < @curRecord + @batch

       SET @curRecord = @curRecord + @batch
   END

DROP TABLE #Temp
Revive answered 9/9, 2016 at 8:12 Comment(2)
Will give it try by MondayArsphenamine
This one is faster than my method.Arsphenamine
G
0

I added (NOLOCK) your destination table -> dbo.Destination(NOLOCK). Now, You won't lock your table.

WHILE 1 = 1
  BEGIN
      INSERT INTO [dbo].[Destination] 
                  (proj_details_sid,
                   period_sid,
                   sales,
                   units)
      SELECT TOP(@BatchSize) s.proj_details_sid,
                             s.period_sid,
                             s.sales,
                             s.units
      FROM   [dbo].[SOURCE] s
      WHERE  NOT EXISTS (SELECT 1
                         FROM   dbo.Destination(NOLOCK) d
                         WHERE  d.proj_details_sid = s.proj_details_sid
                                AND d.period_sid = s.period_sid)

      IF @@ROWCOUNT < @BatchSize
        BREAK
  END 
Gamecock answered 9/9, 2016 at 6:17 Comment(3)
Adding NOLOCK will cause any problem here due to dirty reads ?Arsphenamine
If you will your program not run parallel, dirty reads doesn't problem.Gamecock
In the above situation, NOLOCK will not cause any problem.Naivete
N
0

To do this you can use WITH (NOLOCK) in your select statement. BUT NOLOCK is not recommended on OLTP Databases.

Naivete answered 9/9, 2016 at 7:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.