Slow self-join delete query
Asked Answered
S

4

1

Does it get any simpler than this query?

delete a.* from matches a
    inner join matches b ON (a.uid = b.matcheduid)

Yes, apparently it does... because the performance on the above query is really bad when the matches table is very large.

matches is about 220 million records. I am hoping that this DELETE query takes the size down to about 15,000 records. How can I improve the performance of the query? I have indexes on both columns. UID and MatchedUID are the only two columns in this InnoDB table, both are of type INT(10) unsigned. The query has been running for over 14 hours on my laptop (i7 processor).

Sosthenna answered 17/8, 2011 at 5:47 Comment(3)
Stupid ideas: Does it resolve any faster with a SELECT? If so, create a temporary table containing the IDs to delete then either join that or use IN (subquery) and see if that's any faster.Scientistic
Other stupid idea: Do you have indexes on those columns?Pig
Yes, indexes are on both columns. Maybe that's why the DELETE is slow? @Dan - I'll try your suggestion.Sosthenna
A
7

Deleting so many records can take a while, I think this is as fast as it can get if you're doing it this way. If you don't want to invest into faster hardware, I suggest another approach:

If you really want to delete 220 million records, so that the table then has only 15.000 records left, thats about 99,999% of all entries. Why not

  1. Create a new table,
  2. just insert all the records you want to survive,
  3. and replace your old one with the new one?

Something like this might work a little bit faster:

/* creating the new table */
CREATE TABLE matches_new
SELECT a.* FROM matches a
LEFT JOIN matches b ON (a.uid = b.matcheduid)
WHERE ISNULL (b.matcheduid)

/* renaming tables */
RENAME TABLE matches TO matches_old;
RENAME TABLE matches_new TO matches;

After this you just have to check and create your desired indexes, which should be rather fast if only dealing with 15.000 records.

Abidjan answered 17/8, 2011 at 6:0 Comment(3)
This is almost a no-brainer. The perfect answer! I'll let you know when the query is done running. Hopefully, it should not take long. :)Sosthenna
@Sosthenna - Yeah, huh? Missed the forest for the trees, is how I feel.Delila
It worked nicely. I went to bed so not sure how long it took, but it was definitely an improvement. @Delila - I should've known it was the DELETE that was slowing me down (because of the indexes). The left outer join while checking NULL is exactly what I wanted, but I couldn't think of it. So frustrating sometimes, right?Sosthenna
C
0

running explain select a.* from matches a inner join matches b ON (a.uid = b. matcheduid) would explain how your indexes are present and being used

Cortex answered 17/8, 2011 at 6:0 Comment(0)
D
0

I might be setting myself up to be roasted here, but in performing a delete operation like this in the midst of a self-join, isn;t the query having to recompute the join index after each deletion?

While it is clunky and brute force, you might consider either:

A. Create a temp table to store the uid's resulting from the inner join, then join to THAT, THEN perfoorm the delete.

OR

B. Add a boolean (bit) typed column, use the join to flag each match (this operation should be FAST), and THEN use:

DELETE * FROM matches WHERE YourBitFlagColumn = True

Then delete the boolean column.

Delila answered 17/8, 2011 at 6:1 Comment(0)
P
0

You probably need to batch your delete. You can do this with a recursive delete using a common table expression or just iterate it on some batch size.

Penrose answered 28/8, 2020 at 21:14 Comment(1)
This would probably work, but it would be helpful to give at least dome kind of example or link.Duisburg

© 2022 - 2024 — McMap. All rights reserved.