How to delete large data of table in SQL without log?
Asked Answered
E

14

176

I have a large data table. There are 10 million records in this table.

What is the best way for this query

   Delete LargeTable where readTime < dateadd(MONTH,-7,GETDATE())
Egmont answered 13/6, 2014 at 20:22 Comment(5)
:) I am afraid unless you are willing to write some sort of ETL to get all the rows readTime >= dateadd(MONTH,-7,GETDATE()) into another table and then issue a Truncate table and put the data back using ETL, you would not be able to prevent it from writing to the logRiposte
Logging is an all or nothing function of having a resilient transactions. It literally doesn't make sense to not have a log for some operation but not others, otherwise the log is useless.Claypoole
Export the data you want to keep, truncate the table, then import back inDissatisfaction
Another option would be using a tablevariable which are not logged. Hence store your readTime >= dateadd(MONTH,-7,GETDATE()) data in a table variable and then truncate the original table and copy back the data from the table variable. I would however keep a back up of the data in case something went wrong and the table gets inadvertently truncated.:) And always do a test run of your script on a lesser environment.Riposte
Possible duplicate of How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows tableIndividual
M
269
  1. If you are Deleting All the rows in that table the simplest option is to Truncate table, something like

     TRUNCATE TABLE LargeTable
     GO
    

Truncate table will simply empty the table, you cannot use WHERE clause to limit the rows being deleted and no triggers will be fired.

  1. On the other hand if you are deleting more than 80-90 Percent of the data, say if you have total of 11 million rows and you want to delete 10 million another way would be to Insert these 1 million rows (records you want to keep) to another staging table. Truncate this large table and Insert back these 1 million rows.

  2. Or if permissions/views or other objects which has this large table as their underlying table doesn't get affected by dropping this table, you can get these relatively small amounts of the rows into another table, drop this table and create another table with same schema, and import these rows back into this ex-Large table.

  3. One last option I can think of is to change your database's Recovery Mode to SIMPLE and then delete rows in smaller batches using a while loop something like this:

     DECLARE @Deleted_Rows INT;
     SET @Deleted_Rows = 1;
    
    
     WHILE (@Deleted_Rows > 0)
       BEGIN
        -- Delete some small number of rows at a time
          DELETE TOP (10000)  LargeTable 
          WHERE readTime < dateadd(MONTH,-7,GETDATE())
    
       SET @Deleted_Rows = @@ROWCOUNT;
     END
    

and don't forget to change the Recovery mode back to full and I think you have to take a backup to make it fully effective (the change or recovery modes).

Michellemichels answered 13/6, 2014 at 20:50 Comment(8)
Also rememeber that if you truncate a table, you cannot have any FKs assocaited with it.Provincetown
But how to be sure that you are deleting 80-90% of data? Let's assume i have only range of values that should be deleted. And i have a few tables. So i have to check every of them and calculate percentage, and if it around 30% i guess this method is not very effective... I'm trying to find optimal solution for unknown case.Infeudation
@Infeudation optimal solution for unknown case that’s the dream isn't it? Unfortunately you cannot cure every disease with any one pill; I have suggested some possible solutions for different scenarios. There is no sliver bullet here unfortunately.Michellemichels
One thing to note if choosing option 4: Depending on the how the table is used, it may be a better option to delete less than 5000 rows at a time to avoid lock escalation.Fungistat
If the count of records to delete is much much bigger then records that will remain in the table, i found that simple select into temp table of the records that will stay in and drop original table and rename of the temp table is much faster. Given that you don't use identity Id foreign key somewhere.Mulvihill
2nd option is the best idea imoCale
We have a table with 150 million very wide records. We periodically need to purge 50 million or so. We sometimes overflow TempDB when purging. Your technique is similar to what I was imagining. Do you have a recommendation for how many records, TOP (xxx), to delete with each step of the while loop? For best performance, Is it better to have a lower number or a higher one?Basifixed
depending on clustered index, delete top(10000) may use table scan to reach rows that fulfill condition and this may take time. in my case it was more effective to divide data into slices by PK and delete with both where readTime < dateadd(...) and pk>pk_start AND pk<=pk_end conditions.Mentality
S
148

@m-ali answer is right but also keep in mind that logs could grow a lot if you don't commit the transaction after each chunk and perform a checkpoint. This is how I would do it and take this article http://sqlperformance.com/2013/03/io-subsystem/chunk-deletes as reference, with performance tests and graphs:

DECLARE @Deleted_Rows INT;
SET @Deleted_Rows = 1;


WHILE (@Deleted_Rows > 0)
  BEGIN

   BEGIN TRANSACTION

   -- Delete some small number of rows at a time
     DELETE TOP (10000)  LargeTable 
     WHERE readTime < dateadd(MONTH,-7,GETDATE())

     SET @Deleted_Rows = @@ROWCOUNT;

   COMMIT TRANSACTION
   CHECKPOINT -- for simple recovery model
END
Sibeal answered 4/2, 2015 at 15:4 Comment(5)
This should be the accepted answer in case that available disk space is limited. Without COMMIT TRANSACTION and CHECKPOINT the logs are still growing. Thanks for making this clear.Chandra
+1. Just note that you might want to compare @Deleted_Rows to 10000 or you might end up with an infinitely loop due to it indefinitely deleting small sets of data. So WHILE (@Deleted_Rows = 10000) - as soon as there wasn't a full "page" of data to delete it will stop. In your implementation, WHILE (@Deleted_Rows > 0), the while-loop will execute again even if it only deleted one row, and the next execution might also find a row or two to delete - resulting in an infinite loop.Argueta
@NSduToit the WHERE clause is considering records that are at least 7 months old so there won't be new records that fulfil that condition while you are performing the deletion.Sibeal
@FranciscoGoldenstein Well, the date used in the query will be different with each iteration as you repeatedly calculate the date within the WHILE loop itself: dateadd(MONTH,-7,GETDATE()).Argueta
@FranciscoGoldenstein Also, maybe for other use cases than this one - maybe new data gets added to the underlying table that will result in new records that can be deleted between different iterations of the WHILE loop.Argueta
D
71

You can also use GO + how many times you want to execute the same query.

DELETE TOP (10000)  [TARGETDATABASE].[SCHEMA].[TARGETTABLE] 
WHERE readTime < dateadd(MONTH,-1,GETDATE());
-- how many times you want the query to repeat
GO 100
Distortion answered 25/11, 2016 at 10:5 Comment(6)
I like this, it is working for me I accidentally inserted the same row into a table 26 Million times and needed to delete all occurrences of it, which in one single delete statement ran out of memory on the server, so this is great one question, will it stop mid loop if it runs out of rows to delete?Callaway
@ScottC, it is not a loop, it just repeats the query (batch like) and if you run out of rows it can not delete anything. But it will not stop. you wil get something like (0 row(s) affected) if it runs out of rows you delete.Distortion
ah, yes I discovered that about 5 mins after I posted my question, since my delete finished, thanks this was very helpful!Callaway
From what MS SQL Server is this syntax GO xx supposed to work? I get a "Could not find stored procedure '' " error. Without the GO command it works fine though.Err
Hmm, it seems like I can execute it, and it runs indeed multiple times, but in MS SQL Mgt Studio it shows the red curly line with the mentioned error (but F5-run works then)Err
No clue whats so ever, why this is happening. But it will run indeed.Distortion
R
15

@Francisco Goldenstein, just a minor correction. The COMMIT must be used after you set the variable, otherwise the WHILE will be executed just once:

DECLARE @Deleted_Rows INT;
SET @Deleted_Rows = 1;

WHILE (@Deleted_Rows > 0)
BEGIN
    BEGIN TRANSACTION

    -- Delete some small number of rows at a time
    DELETE TOP (10000)  LargeTable 
    WHERE readTime < dateadd(MONTH,-7,GETDATE())

    SET @Deleted_Rows = @@ROWCOUNT;

    COMMIT TRANSACTION
    CHECKPOINT -- for simple recovery model

END
Recurrent answered 4/11, 2016 at 13:14 Comment(0)
I
12

This variation of M.Ali's is working fine for me. It deletes some, clears the log and repeats. I'm watching the log grow, drop and start over.

DECLARE @Deleted_Rows INT;
SET @Deleted_Rows = 1;
WHILE (@Deleted_Rows > 0)
  BEGIN
   -- Delete some small number of rows at a time
    delete top (100000) from InstallLog where DateTime between '2014-12-01' and '2015-02-01'
    SET @Deleted_Rows = @@ROWCOUNT;
    dbcc shrinkfile (MobiControlDB_log,0,truncateonly);
END
Idette answered 14/5, 2015 at 22:13 Comment(1)
This was very useful! I modified it to parameterize the # of rows to delete at a time, and also the WHERE clause. Works like a charm!Epos
J
10

If you are willing (and able) to implement partitioning, that is an effective technique for removing large quantities of data with little run-time overhead. Not cost-effective for a once-off exercise, though.

Jaquesdalcroze answered 14/6, 2014 at 5:11 Comment(0)
L
4

I was able to delete 19 million rows from my table of 21 million rows in matter of minutes. Here is my approach.

If you have a auto-incrementing primary key on this table, then you can make use of this primary key.

  1. Get minimum value of primary key of the large table where readTime < dateadd(MONTH,-7,GETDATE()). (Add index on readTime, if not already present, this index will anyway be deleted along with the table in step 3.). Lets store it in a variable 'min_primary'

  2. Insert all the rows having primary key > min_primary into a staging table (memory table if no. of rows is not large).

  3. Drop the large table.

  4. Recreate the table. Copy all the rows from staging table to main table.

  5. Drop the staging table.

Lebanon answered 29/8, 2017 at 9:46 Comment(1)
There is no need to drop the large table. Doing a truncate is also OK and fastMcniel
J
4

Shorter syntax

select 1
WHILE (@@ROWCOUNT > 0)
BEGIN
  DELETE TOP (10000) LargeTable 
  WHERE readTime < dateadd(MONTH,-7,GETDATE())
END
Jabiru answered 15/3, 2018 at 11:57 Comment(0)
R
3

You can delete small batches using a while loop, something like this:

DELETE TOP (10000)  LargeTable 
WHERE readTime < dateadd(MONTH,-7,GETDATE())
WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (10000)  LargeTable 
    WHERE readTime < dateadd(MONTH,-7,GETDATE())
END
Ricebird answered 27/12, 2016 at 16:11 Comment(0)
I
3

If you are using SQL server 2016 or higher and if your table is having partitions created based on column you are trying to delete(for example Timestamp column), then you could use this new command to delete data by partitions.

TRUNCATE TABLE WITH ( PARTITIONS ( { | } [ , ...n ] ) )

This will delete the data in selected partition(s) only and should be the most efficient way to delete data from part of table since it will not create transaction logs and will be done just as fast as regular truncate but without having all the data deleted from the table.

Drawback is if your table is not setup with partition, then you need to go old school and delete the data with regular approach and then recreate the table with partitions so that you can do this in future, which is what I did. I added the partition creation and deletion into insertion procedure itself. I had table with 500 million rows so this was the only option to reduce deletion time.

For more details refer to below links: https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-2017

SQL server 2016 Truncate table with partitions

Below is what I did first to delete the data before I could recreate the table with partitions with required data in it. This query will run for days during specified time window until the data is deleted.

:connect <<ServerName>>
use <<DatabaseName>>

SET NOCOUNT ON;
DECLARE @Deleted_Rows INT;
DECLARE @loopnum INT;
DECLARE @msg varchar(100);
DECLARE @FlagDate datetime;
SET @FlagDate =  getdate() - 31;
SET @Deleted_Rows = 1;
SET @loopnum = 1;

/*while (getdate() < convert(datetime,'2018-11-08 14:00:00.000',120))
BEGIN
    RAISERROR( 'WAIT for START' ,0,1) WITH NOWAIT   
    WAITFOR DELAY '00:10:00'
END*/
RAISERROR( 'STARTING PURGE' ,0,1) WITH NOWAIT   

WHILE (1=1)
BEGIN
    WHILE (@Deleted_Rows > 0 AND (datepart(hh, getdate() ) >= 12 AND datepart(hh, getdate() ) <= 20)) -- (getdate() < convert(datetime,'2018-11-08 19:00:00.000',120) )
      BEGIN
       -- Delete some small number of rows at a time
         DELETE TOP (500000)  dbo.<<table_name>>
         WHERE timestamp_column < convert(datetime, @FlagDate,102)
         SET @Deleted_Rows = @@ROWCOUNT;
         WAITFOR DELAY '00:00:01'
         select @msg = 'ROWCOUNT' + convert(varchar,@Deleted_Rows);
         set @loopnum = @loopnum + 1
         if @loopnum > 1000
             begin 
                 begin try
                        DBCC SHRINKFILE (N'<<databasename>>_log' , 0, TRUNCATEONLY)
                        RAISERROR( @msg ,0,1) WITH NOWAIT
                 end try
                 begin catch
                     RAISERROR( 'DBCC SHRINK' ,0,1) WITH NOWAIT  
                 end catch
                 set @loopnum = 1
             end
        END
WAITFOR DELAY '00:10:00'
END 
select getdate()
Indemnity answered 11/11, 2018 at 4:47 Comment(1)
Still doesn’t work with FKs, so that’s an annoying limitation ..Pretzel
S
2

Another use:

SET ROWCOUNT 1000 -- Buffer

DECLARE @DATE AS DATETIME = dateadd(MONTH,-7,GETDATE())

DELETE LargeTable  WHERE readTime < @DATE
WHILE @@ROWCOUNT > 0
BEGIN
   DELETE LargeTable  WHERE readTime < @DATE
END
SET ROWCOUNT 0

Optional;

If transaction log is enabled, disable transaction logs.

ALTER DATABASE dbname SET RECOVERY SIMPLE;
Shotton answered 20/7, 2017 at 13:21 Comment(0)
P
2

If i say without loop, i can use GOTO statement for delete large amount of records using sql server. exa.

 IsRepeat:
    DELETE TOP (10000)
    FROM <TableName>
    IF @@ROWCOUNT > 0
         GOTO IsRepeat

like this way you can delete large amount of data with smaller size of delete.

let me know if requires more information.

Peursem answered 7/1, 2020 at 13:18 Comment(0)
D
2

This question is a little old, but I just stumbled onto it looking for assistance. The fastest way to delete a whole bunch of rows, while keeping some, is to create a script that

  1. Creates a temp table (I used a table variable)

  2. Select the rows to keep into the temp table

  3. Truncate the target table

  4. Insert the kept rows back into the target table.

    Begin Tran

I always test first by selecting the rows in the @tmpSaveTable and rolling back the transaction. I just did 17 million rows in a couple of seconds.

Begin tran
DECLARE @tmpSaveTable   table (
...your columns, types, etc. go here )
INSERT @tmpSaveTable (columns here)
SELECT (appropriate columns from target here)
WHERE (which rows to save)
-- appropriate place to test w/ select from @tmpSaveTable
TRUNCATE SourceTable
INSERT SourceTable (columns) 
SELECT (all values from @tmpSaveTable)
--Rollback Tran testing 
Commit Tran
Dispirit answered 13/5, 2022 at 16:9 Comment(0)
P
0

If you want to delete the records of a table with a large number of records but keep some of the records, You can save the required records in a similar table and truncate the main table and then return the saved records to the main table.

Pulvinus answered 3/4, 2021 at 9:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.