You can use count(*) over (partition by ...)
to select all rows that are duplicate of each other. This approach gives you access to all rows and all columns (unlike group by
which consolidates duplicates rows and makes ungrouped columns inaccessible).
To select the original rows or delete their duplicates use row_number() over (partition by ... order by ...)
.
Sample data
create table t (
id int not null primary key,
name varchar(100),
email varchar(100),
created date
);
insert into t (id, name, email, created) values
(1, 'Alice', '[email protected]', '2021-01-01'),
(2, 'Alice', '[email protected]', '2022-01-01'),
(3, 'Alice', '[email protected]', '2023-01-01'),
(4, 'Bob', '[email protected]', '2021-01-01'),
(5, 'Bob', '[email protected]', '2022-01-01'),
(6, 'John', '[email protected]', '2021-01-01'),
(7, 'Zack', '[email protected]', '2021-01-01');
Select all rows that are duplicate of each other
with cte as (
select t.*, count(*) over (partition by name, email) as dup_count
from t
)
select *
from cte
where dup_count > 1;
Result
{ Alice, [email protected] } is present three times, all three instances are selected
{ John, [email protected] } is present only once, it is excluded
| id | name | email | created | dup_count |
|----|-------|-------------------|------------|-----------|
| 1 | Alice | [email protected] | 2021-01-01 | 3 |
| 2 | Alice | [email protected] | 2022-01-01 | 3 |
| 3 | Alice | [email protected] | 2023-01-01 | 3 |
| 4 | Bob | [email protected] | 2021-01-01 | 2 |
| 5 | Bob | [email protected] | 2022-01-01 | 2 |
Select (or delete) the duplicates
The CTE selects all but the oldest row in each set of duplicates
Some RDBMS support delete from CTEs
Or you may use delete from t where id in (...)
approach
with cte as (
select t.*, row_number() over (partition by name, email order by created) as rn
from t
)
delete
from cte
where rn > 1;
Result after deletion
| id | name | email | created |
|----|-------|-------------------|------------|
| 1 | Alice | [email protected] | 2021-01-01 |
| 4 | Bob | [email protected] | 2021-01-01 |
| 6 | John | [email protected] | 2021-01-01 |
| 7 | Zack | [email protected] | 2021-01-01 |
DB<>Fiddle - SQL Server
DB<>Fiddle - MySQL
DB<>Fiddle - Oracle
name
field in the SELECT. – Meridethmeridian