Detect Anomaly Intervals with SQL
Asked Answered
F

4

6

My problem is simple: I have a table with a series of statuses and timestamps (for the sake of curiosity, these statuses indicate alarm levels) and I would like to query this table in order to get duration between two statuses.

Seems simple, but here comes the tricky part: I can´t create look-up tables, procedures and it should be as fast as possible as this table is a little monster holding over 1 billion records (no kidding!)...

The schema is drop dead simple:

[pk] Time Value

(actualy, there is a second pk but it is useless for this)

And below a real world example:

Timestamp          Status
2013-1-1 00:00:00    1
2013-1-1 00:00:05    2
2013-1-1 00:00:10    2
2013-1-1 00:00:15    2
2013-1-1 00:00:20    0
2013-1-1 00:00:25    1
2013-1-1 00:00:30    2
2013-1-1 00:00:35    2
2013-1-1 00:00:40    0

The output, considering only a level 2 alarm, should be as follow should report the begin of a level 2 alarm an its end (when reach 0):

StartTime          EndTime            Interval
2013-1-1 00:00:05  2013-1-1 00:00:20     15
2013-1-1 00:00:30  2013-1-1 00:00:40     10

I have been trying all sorts of inner joins, but all of them lead me to an amazing Cartesian explosion. Can you guys help me figure out a way to accomplish this?

Thanks!

Findlay answered 24/1, 2013 at 19:41 Comment(9)
Which sql server version?Possum
+1 for interesting question. Tried any RANK() in combination with DATEDIFF() ...? I am still thinking if something liek this could work...Harlin
For all intensive purposes, a status of 1 is irrelevant?Suet
Intervals exact 5 seconds?Possum
1bn rows? Urgh, I wonder if (depending on alert level switching frequency) it would be faster as a simple table scan... Otherwise, if the timestamps are always exactly 5 seconds apart, this can be turned into a common 'Gaps and Islands' problem. I kinda doubt that, though.Lava
@Harlin Rank was my first guess, unfortunately I was not able to correlate both ranks on two different filtered groups.Findlay
@njk It is irrelevant. Actually statuses can go from 0 to 10 where 0 is normal and 10 is the worst case.Findlay
@HamletHakobyan No... I wish!! The process that writes the values run every 5 seconds, but this timespan is not always respected.Findlay
-1 (not really) for not caring to create structures. All you need is RANK()/DENSE_RANK(). Partition your query on status and rank it. The status 2 would have the same RANK, and RANK will be different for all other statuses. This is how you find your start and end points...Outgoing
H
4

This has to be one of the harder questions I've seen today - thanks! I assume you can use CTEs? If so, try something like this:

;WITH Filtered
AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY dateField) RN, dateField, Status
    FROM Test    
)
SELECT F1.RN, F3.MinRN,
    F1.dateField StartDate,
    F2.dateField Enddate
FROM Filtered      F1, Filtered F2, (
SELECT F1a.RN, MIN(F3a.RN) as MinRN
FROM Filtered      F1a
   JOIN Filtered F2a ON F1a.RN = F2a.RN+1 AND F1a.Status = 2 AND F2a.Status <> 2
   JOIN Filtered F3a ON F1a.RN < F3a.RN AND F3a.Status <> 2
GROUP BY F1a.RN ) F3 
WHERE F1.RN = F3.RN AND F2.RN = F3.MinRN

And the Fiddle. I didn't add the intervals, but I imagine you can handle that part from here.

Good luck.

Housewares answered 24/1, 2013 at 20:51 Comment(1)
Absolutely amazing! Grouping byRN and filter the min value is brilliant! I will run some testes to see how it behaves on large tables, but as it is using some heavy filtered data, I think it will be fine. If I could I would +2: +1 for the answer and another +1 for introducing me to SQLFiddle that I didn't know it! Thanks!Findlay
L
0

Finally figured out a version I was happy with. It took me remembering an answer from another question (can't remember which one though) where it was pointed out that the difference between two (increasing) sequences was always a constant.

WITH Ordered (occurredAt, status, row, grp) 
             as (SELECT occurredAt, status, 
                        ROW_NUMBER() OVER (ORDER BY occurredat), 
                        ROW_NUMBER() OVER (PARTITION BY status 
                                           ORDER BY occurredat)
                 FROM Alert)

SELECT Event.startDate, Ending.occurredAt as endDate,
       DATEDIFF(second, Event.startDate, Ending.occurredAt) as interval

FROM (SELECT MIN(occurredAt) as startDate, MAX(row) as ending
      FROM Ordered
      WHERE status = 2
      GROUP BY row - grp) Event

LEFT JOIN (SELECT occurredAt, row
           FROM Ordered
           WHERE status != 2) Ending
        ON Event.ending + 1 = Ending.row

(working SQL Fiddle example, with some additional data rows for work checking).

This unfortunately doesn't correctly deal with level-2 statuses that are end rows (behavior unspecified), although it does list them.

Lava answered 24/1, 2013 at 21:42 Comment(0)
D
0

Just for the sake of having an alternative. Tried to do some test on performance, but did not finish.

SELECT
  MIN([main].[Start]) AS [Start],
  [main].[End],
  DATEDIFF(s, MIN([main].[Start]), [main].[End]) AS [Seconds]
FROM
(
  SELECT
    [sub].[Start],
    MIN([sub].[End]) AS [End]
  FROM
  (
    SELECT
      [start].[Timestamp] AS [Start],
      [start].[Status] AS [StartingStatus],
      [end].[Timestamp] AS [End],
      [end].[Status] AS [EndingStatus]
    FROM [Alerts] [start],  [Alerts] [end]
    WHERE [start].[Status] = 2 
      AND [start].[Timestamp] < [end].[Timestamp]
      AND [start].[Status] <> [end].[Status]
  ) AS [sub]
  GROUP BY
    [sub].[Start],
    [sub].[StartingStatus]
) AS [main]
GROUP BY
  [main].[End]

And here is a Fiddle.

Doering answered 24/1, 2013 at 22:29 Comment(0)
F
-1

I do something similar by using id that is an identity to the table.

    create table test(id int primary key identity(1,1),timstamp datetime,val int)

    insert into test(timstamp,val) Values('1/1/2013 00:00:00',1)
    insert into test(timstamp,val) Values('1/1/2013 00:00:05',2)
    insert into test(timstamp,val) Values('1/1/2013 00:00:25',1)
    insert into test(timstamp,val) Values('1/1/2013 00:00:30',2)
    insert into test(timstamp,val) Values('1/1/2013 00:00:35',1)

    select t1.timstamp,t1.val,DATEDIFF(s,t1.timstamp,t2.timstamp) 
    from test t1 left join test t2 on t1.id=t2.id-1

    drop table test

I would also make the timestamps be seconds since 1980 or 2000 or whatever. But then you might not want to do the reverse conversion all the time and so it depends on how often you use the actual time stamp.

Farcical answered 24/1, 2013 at 21:34 Comment(3)
The idea is good, but unfortunately I can't create or change tables on this database, though I'm pretty sure that if I could, this would certainly a valid way to do this!Findlay
Unfortunately the query given is far too simplistic to generate the desired results. However, some sort of psuedo-id is extremely useful in this situation, yes.Lava
What do you mean far too simplistic? I use this on tables with millions of rows to get a report of duration between events. As long as the data is inserted in order by timestamp the query is screaming fast and is exactly what is needed, i.e. duration between events. I would say that yours is far more complex than it needs to be. Mine only won't work if they are inserted out of order. You might play around with it a little and see if you don't agree.Farcical

© 2022 - 2024 — McMap. All rights reserved.