I have 7 related tables and on one of the tables, there is a timestamp column and I want to delete all rows older than 30 days. However, these are VERY big deletes. I'm talking tens of millions of records. If I delete all these records from the main table, I have to look at the other 6 tables and delete the associated records from those tables as well.
My question is what is the best way to optimize this?
I'm thinking of using PARTITION
but only one table has the timestamp column. I'm worried that if I drop the old partition in the main table, the related records will still exist in the other 6 tables. The related records are related by the fields (sid, cid).
For context I'm using snort and barnyard which are IDS processors.
I'm using MySQL 5.1.73, MyISAM tables
Here is a snippet from cleanup logs:
StartTime,EndTime,TimeElapsed,AffectedRows
Wed Jan 6 01:00:01 EST 2016,Wed Jan 6 01:45:11 EST 2016,45:10,2911807
Thu Jan 7 01:00:02 EST 2016,Thu Jan 7 01:25:29 EST 2016,25:27,2230255
Fri Jan 8 01:00:01 EST 2016,Fri Jan 8 01:24:18 EST 2016,24:17,1400470
Sat Jan 9 01:00:02 EST 2016,Sat Jan 9 05:47:10 EST 2016,287:8,23360088
Sun Jan 10 01:00:01 EST 2016,Sun Jan 10 10:06:16 EST 2016,546:15,44970072
Mon Jan 11 01:00:01 EST 2016,Mon Jan 11 09:40:39 EST 2016,520:38,43948091
This was my old cleanup script:
/usr/bin/mysql --defaults-extra-file=/old/.my.cnf snort_db >> /root/snortcleaner.log 2>&1 <<EOF
use snort_db;
DROP TRIGGER IF EXISTS delete_old;
DELIMITER //
CREATE TRIGGER delete_old AFTER DELETE ON event
FOR EACH ROW
BEGIN
DELETE FROM data WHERE data.cid = old.cid AND data.sid = old.sid;
DELETE FROM iphdr WHERE iphdr.cid = old.cid AND iphdr.sid = old.sid;
DELETE FROM icmphdr WHERE icmphdr.cid = old.cid AND icmphdr.sid = old.sid;
DELETE FROM tcphdr WHERE tcphdr.cid = old.cid AND tcphdr.sid = old.sid;
DELETE FROM udphdr WHERE udphdr.cid = old.cid AND udphdr.sid = old.sid;
DELETE FROM opt WHERE opt.cid = old.cid AND opt.sid = old.sid;
END //
DELIMITER ;
EOF
# Send the main MySQL command: Deletes all records betweeen the oldest timestamp and 31 days from now()
# Gets the oldest timestamp and ranges a deletion from that to 31 days before now(). If the oldest timestamp is more recent than 31 days, the following command returns 0 anyway. If it is older than 31 days, it will return them
OLDEST_TIMESTAMP=$(mysql --defaults-extra-file=/old/.my.cnf -Dsnort_db -se "SELECT timestamp FROM event ORDER BY timestamp ASC LIMIT 1;")
NUM_AFFECTED=$(mysql --defaults-extra-file=/old/.my.cnf -Dsnort_db -se "DELETE FROM event WHERE timestamp BETWEEN DATE_SUB('${OLDEST_TIMESTAMP}', INTERVAL 1 HOUR) AND DATE_SUB(NOW(), INTERVAL 31 DAY); SELECT ROW_COUNT();")
This is my current cleanup script:
DELETE FROM event WHERE timestamp BETWEEN DATE_SUB('${OLDEST_TIMESTAMP}', INTERVAL 1 HOUR) AND DATE_SUB(NOW(), INTERVAL 31 DAY);
DELETE FROM data USING data LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM iphdr USING iphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM icmphdr USING icmphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM tcphdr USING tcphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM udphdr USING udphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM opt USING opt LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
I switch back and forth between the two because I don't know which is faster but the reality is that both are too slow.
event
table are you deleting every day? – Amandine