MySQL, how to merge table duplicates entries [duplicate]

Asked 4/10, 2012 at 12:47 Answered 4/10, 2012 at 14:3

Possible Duplicate:
How can I remove duplicate rows?
Remove duplicates using only a MySQL query?

I have a large table with ~14M entries. The table type is MyISAM ans not InnoDB.

Unfortunately, I have some duplicate entries in this table that I found with the following request :

SELECT device_serial, temp, tstamp, COUNT(*) c FROM up_logs GROUP BY device_serial, temp, tstamp HAVING c > 1

To avoid these duplicates in the future, I want to convert my current index to a unique constraint using SQL request :

ALTER TABLE  up_logs DROP INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL,
ALTER TABLE up_logs ADD INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL (  `tstamp` ,  `device_serial` )

But before that, I need to clean up my duplicates!

My question is : How can I keep only one entry of my duplicated entries? Keep in mind that my table contain 14M entries, so I would like avoid loops if it is possible.

Any comments are welcome!

Kokand answered 4/10, 2012 at 12:47 Comment(2)

possibly accepted answer on this question will be helpful for you too: #19432 – Eelpout 4/10, 2012 at 13:1

do you have any ID or something unique? can you show structure of the table? – Lenwood 4/10, 2012 at 13:21

Creating a new unique key on the over columns you need to have as uniques will automatically clean the table of any duplicates.

ALTER IGNORE TABLE `table_name`
    ADD UNIQUE KEY `key_name`(`column_1`,`column_2`);

The IGNORE part does not allow the script to terminate after the first error occurs. And the default behavior is to delete the duplicates.

Pen answered 4/10, 2012 at 12:55 Comment(2)

Thank you, your solution is perfect and very efficient – Kokand 5/10, 2012 at 7:11

As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error. – Sik 25/4, 2019 at 20:31

Since MySQL allows Subqueries in update/delete statements, but not if they refer to the table you want to update, I´d create a copy of the original table first. Then:

DELETE FROM original_table 
WHERE id NOT IN( 
    SELECT id FROM copy_table 
    GROUP BY column1, column2, ...
);

But I could imagine that copying a table with 14M entries takes some time... selecting the items to keep when copying might make it faster:

INSERT INTO copy_table 
    SELECT * FROM original_table 
    GROUP BY column1, column2, ...;

and then

DELETE FROM original_table 
WHERE id IN(
    SELECT id FROM copy_table
);

It was some time since I used MySQL and SQL in general last time, so I´m quite sure that there is something with better performance - but this should work ;)

Reichenberg answered 4/10, 2012 at 14:3 Comment(0)

This is how you can delete duplicate rows... I'll write you my example and you'll need to apply to your code. I have Actors table with ID and I want to delete the rows with repeated first_name

mysql> select actor_id, first_name from actor_2;
+----------+-------------+
| actor_id | first_name  |
+----------+-------------+
|        1 | PENELOPE    |
|        2 | NICK        |
|        3 | ED          |
....
|      199 | JULIA       |
|      200 | THORA       |
+----------+-------------+

200 rows in set (0.00 sec)

-Now I use a Variable called @a to get the ID if the next row have the same first_name(repeated, null if it's not).

mysql> select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name;
+---------------+----------------+
|  first_names  | @a:=first_name |
+---------------+----------------+
|          NULL | ADAM           |
|            71 | ADAM           |
|          NULL | AL             |
|          NULL | ALAN           |
|          NULL | ALBERT         |
|           125 | ALBERT         |
|          NULL | ALEC           |
|          NULL | ANGELA         |
|           144 | ANGELA         |
...
|          NULL | WILL           |
|          NULL | WILLIAM        |
|          NULL | WOODY          |
|            28 | WOODY          |
|          NULL | ZERO           |
+---------------+----------------+
200 rows in set (0.00 sec)

-Now we can get only duplicates ID:

    mysql> select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1;
    +-------------+
    | first_names |
    +-------------+
    |        NULL |
    |          71 |
    |        NULL |
     ...
    |          28 |
    |        NULL |
    +-------------+
    200 rows in set (0.00 sec)

-the Final Step, Lets DELETE!

mysql> delete from actor_2 where actor_id in (select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1);
Query OK, 72 rows affected (0.01 sec)

-Now lets check our table:

mysql> select count(*) from actor_2 group by first_name;
+----------+
| count(*) |
+----------+
|        1 |
|        1 |
|        1 |
...
|        1 |
+----------+
128 rows in set (0.00 sec)

it works, if you have any question write me back

Lenwood answered 4/10, 2012 at 13:37 Comment(0)

Recommended topics

Hot tags