Post optimization needed after deleting rows in a MYSQL Database
Asked Answered
N

2

5

I have a log table that is currently 10GB. It has a lot of data for the past 2 years, and I really feel at this point I don't need so much in there. Am I wrong to assume it is not good to have years of data in a table (a smaller table is better)?

My tables all have an engine of MYISAM.

I would like to delete all data of 2014 and 2015, and soon i'll do 2016, but i'm concerned about after I run the DELETE statement, what exactly will happen. I understand because it's ISAM there is a lock that will occur where no writing can take place? I would probably delete data by the month, and do it late at night, to minimize this as it's a production DB.

My prime interest, specifically, is this: should I take some sort of action after this deletion? Do I need to manually tell MYSQL to do anything to my table, or is MYSQL going to do all the housekeeping itself, reclaiming everything, reindexing, and ultimately optimizing my table after the 400,000k records I'll be deleting.

Thanks everyone!

Nadeen answered 19/12, 2016 at 4:50 Comment(5)
Yes, MySQL should clean up by itself, eventually, no, there is nothing wrong with having a table with lots of data in it. This in fact why databases were created, and if you index properly, queries might still even run fast even with your legacy data there. You could also consider moving old data to an archive table somewhere.Indo
why don't you just wait two more weeks and rename the old table and then make a new one in it's place, one in 2 seconds (provided there's no processing holding a lock on the existing table)Transmigrate
@TimBiegeleisen and e4c5, I believe that your comments converted to answers with additional details will greatly help readers of this question.Calci
wow I really like this idea, e4c5.. just rename the damn thing :) put that as an answer I love itNadeen
Tim I like your answer too :) sorryNadeen
L
7

Plan A: Use a time-series PARTITIONing of the table so that future deletions are 'instantaneous' because of DROP PARTITION. More discussion here . Partitioning only works if you will be deleting all rows older than X.

Plan B: To avoid lengthy locking, chunk the deletes. See here . This is optionally followed by an OPTIMIZE TABLE to reclaim space.

Plan C: Simply copy over what you want to keep, then abandon the rest. This is especially good if you need to preserve only a small proportion of the table.

CREATE TABLE new LIKE real;
INSERT INTO new
    SELECT * FROM real
        WHERE ... ;  -- just the newer rows;
RENAME TABLE real TO old, new TO real;   -- instantaneous and atomic
DROP TABLE old;    -- after verifying that all went well.

Note: The .MYD file contains the data; it will never shrink. Deletes will leave holes in it. Further inserts (and opdates) will use the holes in preference to growing the table. Plans A and C (but not B) will avoid the holes, and truly free up space.

Leeanneleeboard answered 20/12, 2016 at 22:16 Comment(4)
Thanks so much for providing the queries, as I wanted to embark on plan C anyway in 2017. I never knew you could do 'like' and have it mean clone the table structure (that's what it does, right?). if i don't want to copy any data, simply leave out your 2nd line for insert into select, right?Nadeen
Yes, that is what LIKE does in that context. Yes, leave out the INSERT if you don't need the data. Nothing will notice the shift.Leeanneleeboard
@Nadeen - Did you do Plan C in 2017? How did it turn out?Leeanneleeboard
yes I ended up doing Plan C, I've been very happy with this thank you :)Nadeen
C
3

Tim and e4c5 have given some good recommendations and I urge them to add their answers.

You can run OPTIMIZE TABLE after doing the deletes. Optimize table will help you with a few things (taken from the docs):

  • If the table has deleted or split rows, repair the table.
  • If the index pages are not sorted, sort them.
  • If the table's statistics are not up to date (and the repair could not be accomplished by sorting the index), update them.

According to the docs: http://dev.mysql.com/doc/refman/5.7/en/optimize-table.html

Use OPTIMIZE TABLE in these cases, depending on the type of table:

...

After deleting a large part of a MyISAM or ARCHIVE table, or making many changes to a MyISAM or ARCHIVE table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file. After extensive changes to a table, this statement may also improve performance of statements that use the table, sometimes significantly.

Calci answered 19/12, 2016 at 5:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.