SQL find duplicate records occuring within 1 minute of each other
Asked Answered
C

2

6

I am checking website entrys that are recorded in a database

columns: browser, click_type_id, referrer, and datetime

if multiple rows have the same browser, click_type_id, and referrer and are timestamped (occur within 1 minute of one another) they are considered a duplicate.

I need a sql statement that can query for these duplicates based on the above criteria.

Any help is appreciated.

Carousel answered 15/12, 2009 at 20:48 Comment(4)
delete? select all? test for and prevent insert?Aikens
What kind of dialect? Ever heard of group by?Damp
An interesting question is what do you consider within 1 minute of another? If there's a row at 12:30:05, one at 12:30:45 and one at 12:31:10, the first two are within a minute of each other, the last two are as well, but the first and last are not. You would have to figure out how to address that as part of your solution.Stalinism
Do you want to return the duplicates or return distinct values (without duplicates)?Mephitic
E
15
SELECT
     T1.browser,
     T1.click_type,
     T1.referrer,
     T1.datetime,
     T2.datetime
FROM
     My_Table T1
INNER JOIN My_Table T2 ON
     T2.browser = T1.browser AND
     T2.click_type = T1.click_type AND
     T2.referrrer = T1.referrer AND
     T2.datetime > T1.datetime AND
     T2.datetime <= DATEADD(mi, 1, T1.datetime)
Envy answered 15/12, 2009 at 20:54 Comment(3)
You could also just take off the t2 browser, click_type, and referrer references in the select since they will always equal t1's.Slaver
Thanks for the answer. However, any idea how can I count number of occurrence within 1 minute. Using count and groupby directly will not yield accurate result. It gave "number of success self join", instead of number of left table before join.Plier
You'd have to clarify specifically what you want. A string of 20 records each within 5 seconds of each other could be 12 within one minute or 8 within another or they could be 4-12-4 or something else. Probably worth starting your own question rather than trying to sort through it in the comments here.Envy
A
1

To prevent inserts

INSERT MyTable (browser, click_type_id, referrer, [datetime])
SELECT
    @browser, @click_type_id, @referrer, @datetime
WHERE
    NOT EXISTS (SELECT *
        FROM
           MyTable M2
        WHERE
           browser = @browser AND click_type_id = @click_type_id AND referrer = @referrer
           AND
           [datetime] < DATEADD(minute, -1, @datetime))

To find in existing data (relies on smalldatetime accuracy and may help to avoid issues as per comment to question)

SELECT
   browser, click_type_id, referrer, COUNT(*)
FROM
   MyTable
GROUP BY
    browser, click_type_id, referrer, (CAST [datetime] AS smalldatetime)
HAVING
    COUNT(*) > 1
Aikens answered 15/12, 2009 at 20:54 Comment(2)
Wouldn't the smalldatetime conversion be a problem if you had, for example, 12:00:25 and 12:01:14?Envy
@Tom H.: Yes, but soemtimes it depends how you define a minute ;-)Aikens

© 2022 - 2024 — McMap. All rights reserved.