Finding simultaneous events in a database between times
Asked Answered
E

6

15

I have a database that stores phone call records. Each phone call record has a start time and an end time. I want to find out what is the maximum amount of phone calls that are simultaneously happening in order to know if we have exceed the amount of available phone lines in our phone bank. How could I go about solving this problem?

Eileneeilis answered 15/6, 2010 at 11:42 Comment(1)
use a numbers table or a CTE to generate a row for each second between your sample range's start and end dates, then just join your calls to that where the generated row's time is between the call start time and end time, add in a group by and count and you are there.Comeon
T
16

Disclaimer: I'm writing my answer based on the (excelent) following post:

https://www.itprotoday.com/sql-server/calculating-concurrent-sessions-part-3 (Part1 and 2 are recomended also)

The first thing to understand here with that problem is that most of the current solutions found in the internet can have basically two issues

  • The result is not the correct answer (for example if range A overlaps with B and C but B dosen't overlaps with C they count as 3 overlapping ranges).
  • The way to compute it is very innefficient (because is O(n^2) and / or they cicle for each second in the period)

The common performance problem in solutions like the proposed by Unreasons is a cuadratic solution, for each call you need to check all the other calls if they are overlaped.

there is an algoritmical linear common solution that is list all the "events" (start call and end call) ordered by date, and add 1 for a start and substract 1 for a hang-up, and remember the max. That can be implemented easily with a cursor (solution proposed by Hafhor seems to be in that way) but cursors are not the most efficient ways to solve problems.

The referenced article has excelent examples, differnt solutions, performance comparison of them. The proposed solution is:

WITH C1 AS
(
  SELECT starttime AS ts, +1 AS TYPE,
    ROW_NUMBER() OVER(ORDER BY starttime) AS start_ordinal
  FROM Calls

  UNION ALL

  SELECT endtime, -1, NULL
  FROM Calls
),
C2 AS
(
  SELECT *,
    ROW_NUMBER() OVER(  ORDER BY ts, TYPE) AS start_or_end_ordinal
  FROM C1
)
SELECT MAX(2 * start_ordinal - start_or_end_ordinal) AS mx
FROM C2
WHERE TYPE = 1

Explanation

suppose this set of data

+-------------------------+-------------------------+
|        starttime        |         endtime         |
+-------------------------+-------------------------+
| 2009-01-01 00:02:10.000 | 2009-01-01 00:05:24.000 |
| 2009-01-01 00:02:19.000 | 2009-01-01 00:02:35.000 |
| 2009-01-01 00:02:57.000 | 2009-01-01 00:04:04.000 |
| 2009-01-01 00:04:12.000 | 2009-01-01 00:04:52.000 |
+-------------------------+-------------------------+

This is a way to implement with a query the same idea, adding 1 for each starting of a call and substracting 1 for each ending.

  SELECT starttime AS ts, +1 AS TYPE,
    ROW_NUMBER() OVER(ORDER BY starttime) AS start_ordinal
  FROM Calls

this part of the C1 CTE will take each starttime of each call and number it

+-------------------------+------+---------------+
|           ts            | TYPE | start_ordinal |
+-------------------------+------+---------------+
| 2009-01-01 00:02:10.000 |    1 |             1 |
| 2009-01-01 00:02:19.000 |    1 |             2 |
| 2009-01-01 00:02:57.000 |    1 |             3 |
| 2009-01-01 00:04:12.000 |    1 |             4 |
+-------------------------+------+---------------+

Now this code

  SELECT endtime, -1, NULL
  FROM Calls

Will generate all the "endtimes" without row numbering

+-------------------------+----+------+
|         endtime         |    |      |
+-------------------------+----+------+
| 2009-01-01 00:02:35.000 | -1 | NULL |
| 2009-01-01 00:04:04.000 | -1 | NULL |
| 2009-01-01 00:04:52.000 | -1 | NULL |
| 2009-01-01 00:05:24.000 | -1 | NULL |
+-------------------------+----+------+

Now making the UNION to have the full C1 CTE definition, you will have both tables mixed

+-------------------------+------+---------------+
|           ts            | TYPE | start_ordinal |
+-------------------------+------+---------------+
| 2009-01-01 00:02:10.000 |    1 |             1 |
| 2009-01-01 00:02:19.000 |    1 |             2 |
| 2009-01-01 00:02:57.000 |    1 |             3 |
| 2009-01-01 00:04:12.000 |    1 |             4 |
| 2009-01-01 00:02:35.000 | -1   |     NULL      |
| 2009-01-01 00:04:04.000 | -1   |     NULL      |
| 2009-01-01 00:04:52.000 | -1   |     NULL      |
| 2009-01-01 00:05:24.000 | -1   |     NULL      |
+-------------------------+------+---------------+

C2 is computed sorting and numbering C1 with a new column

C2 AS
(
  SELECT *,
    ROW_NUMBER() OVER(  ORDER BY ts, TYPE) AS start_or_end_ordinal
  FROM C1
)

+-------------------------+------+-------+--------------+
|           ts            | TYPE | start | start_or_end |
+-------------------------+------+-------+--------------+
| 2009-01-01 00:02:10.000 |    1 | 1     |            1 |
| 2009-01-01 00:02:19.000 |    1 | 2     |            2 |
| 2009-01-01 00:02:35.000 |   -1 | NULL  |            3 |
| 2009-01-01 00:02:57.000 |    1 | 3     |            4 |
| 2009-01-01 00:04:04.000 |   -1 | NULL  |            5 |
| 2009-01-01 00:04:12.000 |    1 | 4     |            6 |
| 2009-01-01 00:04:52.000 |   -1 | NULL  |            7 |
| 2009-01-01 00:05:24.000 |   -1 | NULL  |            8 |
+-------------------------+------+-------+--------------+

And there is where the magic occurs, at any time the result of #start - #ends is the amount of cocurrent calls at this moment.

for each Type = 1 (start event) we have the #start value in the 3rd column. and we also have the #start + #end (in the 4th column)

#start_or_end = #start + #end

#end = (#start_or_end - #start)

#start - #end = #start - (#start_or_end - #start)

#start - #end = 2 * #start - #start_or_end

so in SQL:

SELECT MAX(2 * start_ordinal - start_or_end_ordinal) AS mx
FROM C2
WHERE TYPE = 1

In this case with the proposed set of calls, the result is 2.

In the proposed article, there is a little improvment to have a grouped result by for example a service or a "phone company" or "phone central" and this idea can also be used to group for example by time slot and have the maximum concurrency hour by hour in a given day.

Tokay answered 29/9, 2015 at 22:59 Comment(2)
This is really good solution, I adopted it for my needs and it really works as a charm. Smart idea. It was also easy to extend it to contain various information from source table (and joined tables) and it hadn't noticeable impact on the performance.Outguard
Superb! I however have calls that end the second they start. I had to change "ORDER BY ts, TYPE" into "ORDER BY ts, TYPE DESC", to have the endtime appear after the starttime in C2. Whether these types of calls should be taken into account is another matter.Underpass
J
10

Given the fact that the maximum number of connections is going to be a StartTime points, you can

SELECT TOP 1 count(*) as CountSimultaneous
FROM PhoneCalls T1, PhoneCalls T2
WHERE
     T1.StartTime between T2.StartTime and T2.EndTime
GROUP BY
     T1.CallID
ORDER BY CountSimultaneous DESC

The query will return for each call the number of simultaneous calls. Either order them descending and select first one or SELECT MAX(CountSimultaneous) from the above (as subquery without ordering and without TOP).

Jolynnjon answered 15/6, 2010 at 12:5 Comment(6)
+1, this works slick! I had to reread the question, but I thought the OP wanted the number of calls and the time it happened. This is much better for only the max number of calls, which is what the OP asked for. My version gives the number of calls per the time, which is useful but not exactly what was asked.Comeon
Actually if you stick MIN(T2.EndTime) and get StartTime from T1 you will also get the start and the end of the period with the maximum number of the concurrent calls without further I/O.Jolynnjon
Continually returning a conversion error "Conversion failed when converting date and/or time from character string."Wanettawanfried
Very elegant. But also very slow :( 15 seconds on 10k records, hours on 1 mln... mysql, i7, fully indexed table. 'will have to figure out how to de-normalize the data.Wiese
I wouldn't really call it that elegant (it is not pruning the results already considered i.e. it recalculates number of concurrent calls for all calls that make the concurrent group of calls), but it is simple. In production scenarios you would probably be better off with realtime stats (i.e. transitionally updating the stats on each insert and absorbin I/O penalty there for faster anlaytics later). CTE might be applicable to the above approach to prune more efficiently, but I have not thought it through.Jolynnjon
Also, rewriting the WHERE as join might lead to better performance on some engines and fully indexed sounds not correct - try with composite indexes on Start and End time.Jolynnjon
C
2

try this:

DECLARE @Calls table (callid int identity(1,1), starttime datetime, endtime datetime)
INSERT @Calls (starttime,endtime) values ('6/12/2010 10:10am','6/12/2010 10:15am')
INSERT @Calls (starttime,endtime) values ('6/12/2010 11:10am','6/12/2010 10:25am')
INSERT @Calls (starttime,endtime) values ('6/12/2010 12:10am','6/12/2010 01:15pm')
INSERT @Calls (starttime,endtime) values ('6/12/2010 11:10am','6/12/2010 10:35am')
INSERT @Calls (starttime,endtime) values ('6/12/2010 12:10am','6/12/2010 12:15am')
INSERT @Calls (starttime,endtime) values ('6/12/2010 10:10am','6/12/2010 10:15am')


DECLARE @StartDate datetime
       ,@EndDate datetime
SELECT @StartDate='6/12/2010'
      ,@EndDate='6/13/2010'
;with AllDates AS
(
    SELECT @StartDate AS DateOf
    UNION ALL
    SELECT DATEADD(second,1,DateOf) AS DateOf
        FROM AllDates
    WHERE DateOf<@EndDate
)
SELECT
    a.DateOf,COUNT(c.callid) AS CountOfCalls
    FROM AllDates           a
        INNER JOIN @Calls   c ON a.DateOf>=c.starttime and a.DateOf<=c.endtime
    GROUP BY a.DateOf
    ORDER BY 2 DESC
    OPTION (MAXRECURSION 0)

OUTPUT:

DateOf                  CountOfCalls
----------------------- ------------
2010-06-12 10:10:00.000 3
2010-06-12 10:10:01.000 3
2010-06-12 10:10:02.000 3
2010-06-12 10:10:03.000 3
2010-06-12 10:10:04.000 3
2010-06-12 10:10:05.000 3
2010-06-12 10:10:06.000 3
2010-06-12 10:10:07.000 3
2010-06-12 10:10:08.000 3
2010-06-12 10:10:09.000 3
2010-06-12 10:10:10.000 3
2010-06-12 10:10:11.000 3
2010-06-12 10:10:12.000 3
2010-06-12 10:10:13.000 3
2010-06-12 10:10:14.000 3
2010-06-12 10:10:15.000 3
2010-06-12 10:10:16.000 3
2010-06-12 10:10:17.000 3
2010-06-12 10:10:18.000 3
2010-06-12 10:10:19.000 3
2010-06-12 10:10:20.000 3
2010-06-12 10:10:21.000 3
2010-06-12 10:10:22.000 3
2010-06-12 10:10:23.000 3
2010-06-12 10:10:24.000 3
2010-06-12 10:10:25.000 3
2010-06-12 10:10:26.000 3
2010-06-12 10:10:27.000 3
....

add a TOP 1 or put this query in a derived table and further aggergate it if necessary.

Comeon answered 15/6, 2010 at 12:8 Comment(1)
Very inneficient for big quantity of Data. x) is O(n^2) and can be solved in O(n) see my other answer x) it calculates the superposition at each second of the day, when it can perform it only at each "start_time" moment... AllDates CTE can be filled with Select distinct starttime from @Calls order by starttimeTokay
P
1
SELECT COUNT(*) FROM calls 
    WHERE '2010-06-15 15:00:00' BETWEEN calls.starttime AND calls.endtime

and repeat this for every second.

Prerequisite answered 15/6, 2010 at 11:51 Comment(4)
Why did this get down-voted? We are trying to do something similar now, and the answer that has the most votes here, the one with CountSimultaneous in the query ... it's great. Works awesome! But then you try to apply that query, with it's cross join, on the real production table?? The one with millions of rows??? Eeek! So now we are thinking about doing this on the second level and union'ing it and getting a max. That's just one idea, but man, to down-vote this? Not all tables in production can have cross joins on them all crazy like the "good answer"!Cuthburt
@Cuthburt Well, it's ugly. And you need to process this from the outside. It works, for a suitable value of "works." ;)Prerequisite
sure, it's maybe not ideal, and yes, we would be doing something like this in code, and there we run the query, but to get down-voted for throwing out a simple idea/alternative? I just don't agree with you getting down-voted! I really think that "down-voters" should have to explain themselves, and if they don't have a good explanation ... THEY get down-voted! lolCuthburt
@JustLooking: Don't take the up/downvotes personally, it's a semi-random number granted by strangers ;)Prerequisite
A
0

The only practical method I can think of is as follows:

Split the period you want to analyze in arbitrary "buckets", say, 24 1-hour buckets over the day. For each Bucket count how many calls either started or finished between the start or the end of the interval

Note that the 1-hour limit is not a hard-and-fast rule. You could make this shorter or longer, depending on how precise you want the calculation to be. You could make the actual "length" of the bucket a function of the average call duration. So, let's assume that your average call is 3 minutes. If it is not too expensive in terms of calculations, use buckets that are 3 times longer than your average call (9 minutes) this should be granular enough to give precise results.

Alkoran answered 15/6, 2010 at 11:54 Comment(0)
A
0
-- assuming calls table with columns starttime and endtime
declare @s datetime, @e datetime;
declare @t table(d datetime);
declare c cursor for select starttime,endtime from calls order by starttime;
open c
while(1=1) begin
  fetch next from c into @s,@e
  if @@FETCH_STATUS<>0 break;
  update top(1) @t set d=@e where d<=@s;
  if @@ROWCOUNT=0 insert @t(d) values(@e);
end
close c
deallocate c

select COUNT(*) as MaxConcurrentCalls from @t
Accad answered 12/10, 2013 at 2:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.