I have a table with the following columns:
URL_ID
URL_ADDR
URL_Time
I want to remove duplicates on the URL_ADDR
column using a MySQL query.
Is it possible to do such a thing without using any programming?
I have a table with the following columns:
URL_ID
URL_ADDR
URL_Time
I want to remove duplicates on the URL_ADDR
column using a MySQL query.
Is it possible to do such a thing without using any programming?
Consider the following test case:
CREATE TABLE mytb (url_id int, url_addr varchar(100));
INSERT INTO mytb VALUES (1, 'www.google.com');
INSERT INTO mytb VALUES (2, 'www.microsoft.com');
INSERT INTO mytb VALUES (3, 'www.apple.com');
INSERT INTO mytb VALUES (4, 'www.google.com');
INSERT INTO mytb VALUES (5, 'www.cnn.com');
INSERT INTO mytb VALUES (6, 'www.apple.com');
Where our test table now contains:
SELECT * FROM mytb;
+--------+-------------------+
| url_id | url_addr |
+--------+-------------------+
| 1 | www.google.com |
| 2 | www.microsoft.com |
| 3 | www.apple.com |
| 4 | www.google.com |
| 5 | www.cnn.com |
| 6 | www.apple.com |
+--------+-------------------+
5 rows in set (0.00 sec)
Then we can use the multiple-table DELETE
syntax as follows:
DELETE t2
FROM mytb t1
JOIN mytb t2 ON (t2.url_addr = t1.url_addr AND t2.url_id > t1.url_id);
... which will delete duplicate entries, leaving only the first url based on url_id
:
SELECT * FROM mytb;
+--------+-------------------+
| url_id | url_addr |
+--------+-------------------+
| 1 | www.google.com |
| 2 | www.microsoft.com |
| 3 | www.apple.com |
| 5 | www.cnn.com |
+--------+-------------------+
3 rows in set (0.00 sec)
UPDATE - Further to new comments above:
If the duplicate URLs will not have the same format, you may want to apply the REPLACE()
function to remove www.
or http://
parts. For example:
DELETE t2
FROM mytb t1
JOIN mytb t2 ON (REPLACE(t2.url_addr, 'www.', '') =
REPLACE(t1.url_addr, 'www.', '') AND
t2.url_id > t1.url_id);
UPDATE
using a self-referential JOIN
and another using a subquery that selects from the same table. The JOIN
test succeeds while the subquery one fails with the 1093 error. The same seems to happen for DELETE
. –
Dever You may want to try the method mentioned at http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/.
ALTER IGNORE TABLE your_table ADD UNIQUE INDEX `tmp_index` (URL_ADDR);
ALTER IGNORE
is no longer supported as of MySQL 5.7.4: https://mcmap.net/q/246918/-mysql-alter-ignore-table-add-unique-what-will-be-truncated –
Canton This will leave the ones with the highest URL_ID
for a particular URL_ADDR
DELETE FROM table
WHERE URL_ID NOT IN
(SELECT ID FROM
(SELECT MAX(URL_ID) AS ID
FROM table
WHERE URL_ID IS NOT NULL
GROUP BY URL_ADDR ) X) /*Sounds like you would need to GROUP BY a
calculated form - e.g. using REPLACE to
strip out www see Daniel's answer*/
(The derived table 'X' is to avoid the error "You can't specify target table 'tablename' for update in FROM clause")
Well, you could always:
INSERT INTO ... SELECT DISTINCT
into the temp table from original table;INSERT INTO ... SELECT
into the original table from the temp tableIt's clumsy and awkward, and requires several queries (not to mention privileges), but it will do the trick if you don't find another solution.
You can group by on the URL_ADDR which will effectively give you only distinct values in the URL_ADDR field.
select
URL_ID
URL_ADDR
URL_Time
from
some_table
group by
URL_ADDR
Enjoy!
Daniel Vassallo How to for multiple column?
DELETE t2
FROM directory1 t1
JOIN directory1 t2 ON
(t2.page = t1.page,
t2.parentTopic = t1.parentTopic,
t2.title = t1.title,
t2.description = t1.description,
t2.linktype = t1.linktype,
t2.priority = t1.priority
AND t2.linkID > t1.linkID);
maybe like this?
DELETE FROM `your_table`
WHERE
`unique_id` IN (
SELECT
`unique_id`
FROM (
SELECT
`unique_id_in_table`,
ROW_NUMBER() OVER (
PARTITION BY `group_by_column`
ORDER BY `group_by_column`) AS row_num
FROM
`your_table`
) t
WHERE row_num > 1
);
Where:
This will work provided that your URL_ID column is unique.
DELETE FROM url WHERE URL_ID IN (
SELECT URL_ID
FROM url a INNER JOIN (
SELECT URL_ADDR, MAX(URL_ID) MaxURLId
FROM url
GROUP BY URL_ADDR
HAVING COUNT(*) > 1) b ON a.URL_ID <> b.MaxURLId AND a.URL_ADDR = b.URL_ADDR
)
© 2022 - 2024 — McMap. All rights reserved.