How can I delete expired data from a huge table without having the log file grow out of control?
Asked Answered
R

3

7

I have a huge table (3 billion rows), which unfortunately contains mostly expired data. I want to simply delete all of these expired rows, and keep the rest.

I can execute a statement like this:

delete from giganticTable where exp_date < getDate()

The execution plan somehow estimates that about 400 million rows will be deleted.

When executed, not only does this not finish after an hour, but the database transaction log file is also growing from 6 GB to 90 GB. Note that the database was in bulk-logged recovery model while this is happening. I eventually canceled this query, since I'm sure there must be a better way to do this.

I have several tables that I need to perform a similar operation to. What's the fastest and most space-efficient way to just delete these rows if I have absolutely no desire to ever recover them?

Note that I'm using Microsoft SQL Server 2005.

Rudolfrudolfo answered 8/5, 2011 at 3:48 Comment(0)
A
9

I've found it useful when doing deletes from table with a large number of rows to delete rows in batches of say 5000 or so (I usually test to see which value works the fastest, sometimes it's 5000 rows, sometimes 10,000, etc.). This allows each delete operation to complete quickly, rather than waiting a long time for one statement to knock out 400 million records.

In SQL Server 2005, something like this should work (please test first, of course):

WHILE EXISTS ( SELECT * FROM giganticTable WHERE exp_date < getDate())
BEGIN
  DELETE TOP(5000) FROM giganticTable WHERE exp_date < getDate()
END

I would see what deleting in batches does to the log file size. If it is still blowing up the logs, then you could try changing the Recovery Model to Simple, deleting the records, and then switching back to Bulk Logged, but only if the system can tolerate the loss of some recent data. I would definitely make a Full Backup before attempting that procedure. This thread also suggests that you could setup a job to backup the logs with truncate only specified, so that could be another option. Hopefully you have an instance you can test with, but I would start with the batched deletes to see how that affects performance and the log file size.

Amadeo answered 8/5, 2011 at 4:32 Comment(0)
F
3

You really don't want to mess with trying anything silly like turning off logging when you want to do a lot of work on a table since any issues during the long task could easily lead to database corruption and other issues. However, there is a way around your issue.

Create a temp table that matches the schema of your real table. Populate it with the data you want to KEEP. Then, truncate the original table (extremely fast and easy on the log files). Finally, move the data out of the temp table and into your original (and now empty) table.

If you use auto-incrementing primary keys, you will need to force the field to take your original keys (so you don't have issues later).

Florafloral answered 8/5, 2011 at 4:0 Comment(4)
Have you seen any instances where just changing the Recovery Model led to database corruption?Amadeo
@Amadeo - not just changing the Recovery Mode, no but I have seen databases corrupted during a large process (like this will be) when the Recovery Mode is set improperly. This causes you to lose your database.Florafloral
I see what you're getting at, but just having a Recovery Mode set to simple doesn't mean you'll lose your database. It just means that you can only restore to your last full backup. I was just asking because your answer makes it seem like just switching Recovery Modes will cause corruption, which I have not seen happen personally.Amadeo
@Amadeo - I see what you mean. After re-reading what I wrote, I see how it looks like I was saying that changing the models could cause corruption. I modified the text a bit to make it more clear. Thanks for pointing that out.Florafloral
B
1

You should have done it daily, so you don't get such a huge job at once.
Since you are in the situation, here are my suggestions:

  1. Split the job like rsbarro says. You probably don't need the while statement--you can do it in several days.
  2. Write the date explicitly like:

    delete from giganticTable where exp_date < '2013-08-07'
    
  3. I don't have a good idea about the huge log, seems there's not a really good way to do.
Beverage answered 7/8, 2013 at 6:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.