Select statement to find duplicates on certain fields
Asked Answered
U

9

429

Can you help me with SQL statements to find duplicates on multiple fields?

For example, in pseudo code:

select count(field1,field2,field3) 
from table 
where the combination of field1, field2, field3 occurs multiple times

and from the above statement if there are multiple occurrences I would like to select every record except the first one.

Usable answered 13/12, 2010 at 22:30 Comment(1)
your pseudo code is ambiguous, plus you don't define order according to which you don't want the first. i suggest you give some sample data.Fidellia
F
864

To get the list of fields for which there are multiple records, you can use..

select field1,field2,field3, count(*)
  from table_name
  group by field1,field2,field3
  having count(*) > 1

Check this link for more information on how to delete the rows.

http://support.microsoft.com/kb/139444

There should be a criterion for deciding how you define "first rows" before you use the approach in the link above. Based on that you'll need to use an order by clause and a sub query if needed. If you can post some sample data, it would really help.

Faenza answered 13/12, 2010 at 22:37 Comment(0)
C
46

You mention "the first one", so I assume that you have some kind of ordering on your data. Let's assume that your data is ordered by some field ID.

This SQL should get you the duplicate entries except for the first one. It basically selects all rows for which another row with (a) the same fields and (b) a lower ID exists. Performance won't be great, but it might solve your problem.

SELECT A.ID, A.field1, A.field2, A.field3
  FROM myTable A
 WHERE EXISTS (SELECT B.ID
                 FROM myTable B
                WHERE B.field1 = A.field1
                  AND B.field2 = A.field2
                  AND B.field3 = A.field3
                  AND B.ID < A.ID)
Centro answered 13/12, 2010 at 22:39 Comment(0)
C
19

This is a fun solution with SQL Server 2005 that I like. I'm going to assume that by "for every record except for the first one", you mean that there is another "id" column that we can use to identify which row is "first".

SELECT id
    , field1
    , field2
    , field3
FROM
(
    SELECT id
        , field1
        , field2
        , field3
        , RANK() OVER (PARTITION BY field1, field2, field3 ORDER BY id ASC) AS [rank]
    FROM table_name
) a
WHERE [rank] > 1
Chickenlivered answered 13/12, 2010 at 23:3 Comment(3)
Just noticed the SQL Server 2008 tag. Glad my suggestion is still valid.Chickenlivered
Excellent solution because it also returns the rows that will need to be deleted from the table in questionWarlord
it helps to think of the PARTITION BY field list as a listing of PK fieldsSaunders
P
6

To see duplicate values:

with MYCTE  as (
    select row_number() over ( partition by name  order by name) rown, *
    from tmptest  
    ) 
select * from MYCTE where rown <=1
Pallbearer answered 29/11, 2013 at 11:58 Comment(0)
L
3

If you're using SQL Server 2005 or later (and the tags for your question indicate SQL Server 2008), you can use ranking functions to return the duplicate records after the first one if using joins is less desirable or impractical for some reason. The following example shows this in action, where it also works with null values in the columns examined.

create table Table1 (
 Field1 int,
 Field2 int,
 Field3 int,
 Field4 int 
)

insert  Table1 
values    (1,1,1,1)
        , (1,1,1,2)
        , (1,1,1,3)
        , (2,2,2,1)
        , (3,3,3,1)
        , (3,3,3,2)
        , (null, null, 2, 1)
        , (null, null, 2, 3)

select    *
from     (select      Field1
                    , Field2
                    , Field3
                    , Field4
                    , row_number() over (partition by   Field1
                                                      , Field2
                                                      , Field3
                                         order by       Field4) as occurrence
          from      Table1) x
where     occurrence > 1

Notice after running this example that the first record out of every "group" is excluded, and that records with null values are handled properly.

If you don't have a column available to order the records within a group, you can use the partition-by columns as the order-by columns.

Litchfield answered 13/12, 2010 at 22:59 Comment(0)
T
1
CREATE TABLE #tmp
(
    sizeId Varchar(MAX)
)

INSERT  #tmp 
    VALUES ('44'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46')


SELECT * FROM #tmp
DECLARE @SqlStr VARCHAR(MAX)

SELECT @SqlStr = STUFF((SELECT ',' + sizeId
              FROM #tmp
              ORDER BY sizeId
              FOR XML PATH('')), 1, 1, '') 


SELECT TOP 1 * FROM (
select items, count(*)AS Occurrence
  FROM dbo.Split(@SqlStr,',')
  group by items
  having count(*) > 1
  )K
  ORDER BY K.Occurrence DESC    
Trumpeter answered 19/11, 2016 at 10:55 Comment(0)
W
1

Try this query to have a separate count of each SELECT statement:

select field1, count(field1) as field1Count, field2,count(field2) as field2Counts, field3, count(field3) as field3Counts
from table_name
group by field1, field2, field3
having count(*) > 1
Worser answered 17/2, 2019 at 6:34 Comment(0)
B
0

Try this query to find duplicate records on multiple fields

SELECT a.column1, a.column2
FROM dbo.a a
JOIN (SELECT column1, 
       column2, count(*) as countC
FROM dbo.a 
GROUP BY column4, column5
HAVING count(*) > 1 ) b
ON a.column1 = b.column1
AND a.column2 = b.column2
Bangui answered 23/11, 2020 at 17:12 Comment(0)
H
0

You can also try this query to count a distinct() column and order by with your desired column:

select field1, field2, field3, count(distinct (field2))
from table_name
group by field1, field2, field3
having count(field2) > 1
order by field2;

Hamal answered 3/6, 2022 at 12:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.