How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table
Asked Answered
G

11

34

Let's say we have table Sales with 30 columns and 500,000 rows. I would like to delete 400,000 in the table (those where "toDelete='1'").

But I have a few constraints :

  • the table is read / written "often" and I would not like a long "delete" to take a long time and lock the table for too long
  • I need to skip the transaction log (like with a TRUNCATE) but while doing a "DELETE ... WHERE..." (I need to put a condition), but haven't found any way to do this...

Any advice would be welcome to transform a

DELETE FROM Sales WHERE toDelete='1'

to something more partitioned & possibly transaction log free.

Goforth answered 27/6, 2012 at 15:48 Comment(4)
Why do you need to skip the transaction log?Iphlgenia
Would appreciate it that if after you're done you'll post the optimal solution (or at least the one that worked best for you).Tenner
@thecoon: I definitely will. Thanks to ALL for your various, complementary answers.Goforth
Review the recovery model. msdn.microsoft.com/en-us/library/ms189275.aspx You can break up the delete but if recovery model is full all deletes still be in the log (maybe what you want) until backup. For reads could with (no lock) if dirty reads are OK.Donetsk
E
52

Calling DELETE FROM TableName will do the entire delete in one large transaction. This is expensive.

Here is another option which will delete rows in batches :

deleteMore:
DELETE TOP(10000) Sales WHERE toDelete='1'
IF @@ROWCOUNT != 0
    goto deleteMore
Eloquent answered 27/6, 2012 at 15:55 Comment(3)
Oh my! How did I never realise that you can put a TOP() expression in a DELETE statement? Much more concise than my functionally identical answer! +1 to you sir!Turner
Frankly I didn't even remember you could use labels in SQL 2008. I'd rather see a WHILE statement. WHILE 1 = 1 BEGIN; DELETE ...; IF @@RowCount = 0 BREAK; END; This to me is clearer for the next sql writer who comes along that a loop is occurring, rather than figuring out the awful GOTO.Bunni
Beautiful! Oh, and I can easily parse a single GOTO. It was those messes we'd make in BASIC and FORTRAN that got hard to think about.Tenne
K
28

I'll leave my answer here, since I was able to test different approaches for mass delete and update (I had to update and then delete 125+mio rows, server has 16GB of RAM, Xeon E5-2680 @2.7GHz, SQL Server 2012).

TL;DR: always update/delete by primary key, never by any other condition. If you can't use PK directly, create a temp table and fill it with PK values and update/delete your table using that table. Use indexes for this.

I started with solution from above (by @Kevin Aenmey), but this approach turned out to be inappropriate, since my database was live and it handles a couple of hundred transactions per second and there was some blocking involved (there was an index for all there fields from condition, using WITH(ROWLOCK) didn't change anything).

So, I added a WAITFOR statement, which allowed database to process other transactions.

deleteMore:
WAITFOR DELAY '00:00:01'
DELETE TOP(1000) FROM MyTable WHERE Column1 = @Criteria1 AND Column2 = @Criteria2 AND Column3 = @Criteria3
IF @@ROWCOUNT != 0
    goto deleteMore

This approach was able to process ~1.6mio rows/hour for updating and ~0,2mio rows/hour for deleting.

Turning to temp tables changed things quite a lot.

deleteMore:
SELECT TOP 10000 Id /* Id is the PK */
  INTO #Temp 
  FROM MyTable WHERE Column1 = @Criteria1 AND Column2 = @Criteria2 AND Column3 = @Criteria3 

DELETE MT
  FROM MyTable MT
  JOIN #Temp T ON T.Id = MT.Id 

/* you can use IN operator, it doesn't change anything
 DELETE FROM MyTable WHERE Id IN (SELECT Id FROM #Temp)

 */
IF @@ROWCOUNT > 0 BEGIN
    DROP TABLE #Temp
    WAITFOR DELAY '00:00:01'
    goto deleteMore
END ELSE BEGIN
    DROP TABLE #Temp
    PRINT 'This is the end, my friend'
END

This solution processed ~25mio rows/hour for updating (15x faster) and ~2.2mio rows/hour for deleting (11x faster).

Knock answered 6/3, 2019 at 10:39 Comment(2)
Nice just the same thing that i was looking into. Saved me a research time ;) Thanks a lot!Sanguinary
I am curious: why does filling your own temp table make it faster than the DELETE TOP version? Doesn't the ordinary DELETE optimization use indexes to lookup the row by primary key anyway, which is analogous to this? Could the differences be because you did DELETE TOP 1000 vs. SELECT TOP 10000? (one thousand vs. ten thousand?) Could you the timings again with the same TOP number?Mathur
H
11

What you want is batch processing.

While (select Count(*) from sales where toDelete =1) >0
BEGIN
Delete from sales where SalesID in
(select top 1000 salesId from sales where toDelete = 1)
END

Of course you can experiment which is the best value to use for the batch, I've used from 500 - 50000 depending on the table. If you use cascade delete, you will probably need a smaller number as you have those child records to delete.

Hardball answered 27/6, 2012 at 15:56 Comment(4)
The repeated calls to COUNT(*) seem un-necessary when @@rowcount can be used. But it is a very portable solution to other RDBMSes...Turner
True, I whipped this up in a hurry and didn't take time to optimize completely. And I am not really used to delete having the top keyword, hard to go against years of practice sometimes.Hardball
Just Removed the : at the last deleteMore label because it gave me an error label was already declared. Worked fine with this small correction.Louielouis
What about WHILE EXISTS (SELECT * FROM sales WHERE toDelete = 1) as a conditionFluoride
K
5

One way I have had to do this in the past is to have a stored procedure or script that deletes n records. Repeat until done.

DELETE TOP 1000 FROM Sales WHERE toDelete='1'
Kort answered 27/6, 2012 at 15:52 Comment(11)
Every time you shrink a database, a kitten dies! http://www.sqlskills.com/BLOGS/PAUL/post/Why-you-should-not-shrink-your-data-files.aspxDisinherit
To clarify (as it's a bit of a hack), I was doing this due to disk-space constraints on my logging directory at the time, rather than locking. I guess the locked-duration with this is the same, just spread out longer :)Kort
Shrink the database? No, please don't do that. Hopefully you meant checkpoint or something...Spradling
@JoeStefanelli In an ideal world, perhaps. We don't all have infinite data-storage, and sometimes if you've deleted 90% of your database, it's okay to clean up your data files too. And if you're talking about index fragmentation, well just re-index.Kort
it's okay to clean up your data files - after a complete operation that is atypical and in a case where you know you won't need that space again? Maybe, but that's very edge case. After every delete? I disagree 100%.Spradling
I wasn't talking about after every delete.Kort
@Kort then fix the wording in your answer. deletes n records, then shrinks the database. Repeat until done. This is horrible advice.Spradling
I will, but read the answer. I didn't say "do this". I said "this is how I've done it before", and I intentionally didn't give code because it's a too domain-specific question.Kort
@AaronBertrand - I agree, I wouldn't shrink after every delete. I can see how it helped Cylindric slowly paint his way out of a lack of disk space, but it's not appropriate here.Turner
@Kort I don't now how anyone will take your answer except as an answer. If you don't want it to be an answer taken that way, then it should be a comment.Spradling
Here's an updated link for @JoeStefanelli's comment: sqlskills.com/blogs/paul/…Lentil
T
3

You should try to give it a ROWLOCK hint so it will not lock the entire table. However, if you delete a lot of rows lock escalation will occur.

Also, make sure you have a non-clustered filtered index (only for 1 values) on the toDelete column. If possible make it a bit column, not varchar (or what it is now).

DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'

Ultimately, you can try to iterate over the table and delete in chunks.

Updated

Since while loops and chunk deletes are the new pink here, I'll throw in my version too (combined with my previous answer):

SET ROWCOUNT 100
DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'

WHILE @@rowcount > 0
BEGIN
  SET ROWCOUNT 100
  DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'  
END
Tenner answered 27/6, 2012 at 15:53 Comment(1)
SET ROWCOUNT is deprecated in SQL 2012.Bunni
M
3

My own take on this functionality would be as follows. This way there is no repeated code and you can manage your chunk size.

DECLARE @DeleteChunk INT = 10000
DECLARE @rowcount INT = 1

WHILE @rowcount > 0
BEGIN

  DELETE TOP (@DeleteChunk) FROM Sales WITH(ROWLOCK)

  SELECT @rowcount = @@RowCount
END
Monochromat answered 23/11, 2016 at 14:37 Comment(1)
Should be: SET @rowcount = @@RowCount ?Audriaaudrie
P
1

A shorter version of using while and @@rowcount without any temp variables etc.

WHILE 1 = 1 --endless loop
BEGIN
    DELETE TOP 1000 FROM Sales WHERE blahblah
    IF (@@ROWCOUNT = 0) BREAK;
END

You can optionally add a delay inside the loop:

WAITFOR DELAY '00:00:00.500' --500ms delay
Plinth answered 30/4, 2023 at 19:50 Comment(0)
L
0

I have used the below to delete around 50 million records -

BEGIN TRANSACTION     
     DeleteOperation:
     DELETE TOP (BatchSize)
     FROM  [database_name].[database_schema].[database_table] 

     IF @@ROWCOUNT > 0
     GOTO DeleteOperation
COMMIT TRANSACTION

Please note that keeping the BatchSize < 5000 is less expensive on resources.

Ludwigg answered 9/1, 2017 at 5:47 Comment(1)
Why answer the same as the accepted answer?Selwin
B
0

As I assume the best way to delete huge amount of records is to delete it by Primary Key. (What is Primary Key see here)

So you have to generate tsql script that contains the whole list of lines to delete and after this execute this script.

For example code below is gonna generate that file

GO
SET NOCOUNT ON

SELECT   'DELETE FROM  DATA_ACTION WHERE ID = ' + CAST(ID AS VARCHAR(50)) + ';' + CHAR(13) + CHAR(10) + 'GO'
FROM    DATA_ACTION
WHERE  YEAR(AtTime) = 2014

The ouput file is gonna have records like

DELETE FROM  DATA_ACTION WHERE ID = 123;
GO
DELETE FROM  DATA_ACTION WHERE ID = 124;
GO
DELETE FROM  DATA_ACTION WHERE ID = 125;
GO

And now you have to use SQLCMD utility in order to execute this script.

sqlcmd -S [Instance Name] -E -d [Database] -i [Script]

You can find this approach explaned here https://www.mssqltips.com/sqlservertip/3566/deleting-historical-data-from-a-large-highly-concurrent-sql-server-database-table/

Bauman answered 30/8, 2017 at 8:29 Comment(0)
F
0

Here's how I do it when I know approximately how many iterations:

delete from Activities with(rowlock) where Id in (select top 999 Id from Activities 
(nolock) where description like 'financial data update date%' and len(description) = 87 
and User_Id = 2);
waitfor delay '00:00:02'
GO 20

Edit: This worked better and faster for me than selecting top:

declare @counter int = 1
declare @msg varchar(max)
declare @batch int = 499

while ( @counter <= 37600)

begin
    set @msg = ('Iteration count = ' + convert(varchar,@counter))
    raiserror(@msg,0,1) with nowait
    delete Activities with (rowlock) where Id in (select Id from Activities (nolock) where description like 'financial data update date%' and len(description) = 87 and User_Id = 2 order by Id asc offset 1 ROWS fetch next @batch rows only)
    set @counter = @counter + 1
    waitfor delay '00:00:02'
end
Felipafelipe answered 10/3, 2021 at 11:39 Comment(0)
S
-1
Declare @counter INT  
Set @counter = 10 -- (you can always obtain the number of rows to be deleted and set the counter to that value)  
While @Counter > 0  
Begin  
Delete TOP (4000) from <Tablename> where ID in (Select ID from <sametablename> with (NOLOCK) where DateField < '2021-01-04') -- or opt for GetDate() -1  
Set @Counter = @Counter -1 -- or set @counter = @counter - 4000 if you know number of rows to be deleted.  
End
Sideward answered 13/9, 2021 at 15:26 Comment(2)
This code only solution is a variation on the general advice already provided. You should include arguments why this solution is superior to the others, then please also format your code block to help with the readability. For what it's worth, its not a bad solution, but as it stands this is a low quality answer.Echinoderm
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Shadowgraph

© 2022 - 2025 — McMap. All rights reserved.