Count max. number of concurrent user sessions per day
Asked Answered
L

2

6

Situation

We have a PostgreSQL 8.4 database containing user sessions with login date/time and logout date/time per row. Our web application records this time and also handles the case when user does not logout explicitly (session timeout). So a login date/time and logout date/time are given in every case.

Goal

I need user statistics of the max number of concurrent sessions a day. So, I can say the following: "At 2015-03-16 the peak of concurrent users logged in was six."

Similar questions

A similar question has been answered here: SQL max concurrent sessions per hour of day However, I was unable to adapt the solution to my case, where I want to have a result table, which shows the max. number of concurrent user sessions per day and not per hour. The table scheme varies also slightly, because one row in my case contains the login and the logout date/time together, whereas in the example each row represent either a login or a logout. In addition, the question is based on a MS SQL database environment instead of PostgreSQL.

Considerations

  • Sessions of different users can overlap
  • A user might have recurring sessions, which only should count once (group by username)
  • The table scheme of the session table looks as follows

Table scheme:

user_id     |  login_date  |  login_time  |  logout_date  |  logout_time
------------+--------------+--------------+---------------+-------------
USER32      |  2014-03-03  |    08:23:00  |   2014-03-03  |     14:44:00
USER82      |  2014-03-03  |    08:49:00  |   2014-03-03  |     17:18:00
USER83      |  2014-03-03  |    09:40:00  |   2014-03-03  |     17:31:00
USER36      |  2014-03-03  |    09:50:00  |   2014-03-03  |     16:10:00
USER37      |  2014-03-03  |    11:44:00  |   2014-03-03  |     15:21:00
USER72      |  2014-03-03  |    12:52:00  |   2014-03-03  |     12:55:00

Example

Following example illustrated as timeline via Google Charts API should help understand the problem: https://i.sstatic.net/02GNg.png

Given this example of a day 2015-03-03, all users except of USER78 (6 users) were logged in between 12:52 and 12:55 that day. This is the max number of concurrently logged in users and I need such a statistic per day for a given time range.

Day         |   MaxNumberOfConcurrentSessions 
------------+--------------------------------
2015-03-01  |                 2 
2015-03-02  |                 3
2015-03-03  |                 6
...

Example of timeline screenshot above as Google Charts API.

google.setOnLoadCallback(drawChart);
function drawChart() {

  var container = document.getElementById('example5.1');
  var chart = new google.visualization.Timeline(container);
  var dataTable = new google.visualization.DataTable();
  dataTable.addColumn({ type: 'string', id: 'Room' });
  dataTable.addColumn({ type: 'string', id: 'Name' });
  dataTable.addColumn({ type: 'date', id: 'Start' });
  dataTable.addColumn({ type: 'date', id: 'End' });
  dataTable.addRows([
["USER78", '', new Date(2014,03,03,20,38), new Date(2014,03,03,21,14)],
["USER83", '', new Date(2014,03,03,09,40), new Date(2014,03,03,17,31)],
["USER72", '', new Date(2014,03,03,08,43), new Date(2014,03,03,08,43)],
["USER72", '', new Date(2014,03,03,09,40), new Date(2014,03,03,09,40)],
["USER72", '', new Date(2014,03,03,10,03), new Date(2014,03,03,10,06)],
["USER72", '', new Date(2014,03,03,12,52), new Date(2014,03,03,12,55)],
["USER72", '', new Date(2014,03,03,21,13), new Date(2014,03,03,21,13)],
["USER72", '', new Date(2014,03,03,21,37), new Date(2014,03,03,21,38)],
["USER72", '', new Date(2014,03,03,23,14), new Date(2014,03,03,23,15)],
["USER72", '', new Date(2014,03,03,23,27), new Date(2014,03,03,23,28)],
["USER36", '', new Date(2014,03,03,08,05), new Date(2014,03,03,09,17)],
["USER36", '', new Date(2014,03,03,09,50), new Date(2014,03,03,16,10)],
["USER36", '', new Date(2014,03,03,16,12), new Date(2014,03,03,20,29)],
["USER32", '', new Date(2014,03,03,08,23), new Date(2014,03,03,14,44)],
["USER82", '', new Date(2014,03,03,08,49), new Date(2014,03,03,17,18)],
["USER37", '', new Date(2014,03,03,08,04), new Date(2014,03,03,08,06)],
["USER37", '', new Date(2014,03,03,11,44), new Date(2014,03,03,15,21)],
["USER37", '', new Date(2014,03,03,15,34), new Date(2014,03,03,15,51)],
["USER37", '', new Date(2014,03,03,16,12), new Date(2014,03,03,16,14)],
["USER37", '', new Date(2014,03,03,16,52), new Date(2014,03,03,16,54)],
["USER37", '', new Date(2014,03,03,17,07), new Date(2014,03,03,17,08)],
["USER37", '', new Date(2014,03,03,20,20), new Date(2014,03,03,20,24)],
["USER37", '', new Date(2014,03,03,21,03), new Date(2014,03,03,21,20)],
["USER37", '', new Date(2014,03,03,22,42), new Date(2014,03,03,23,05)],
["USER37", '', new Date(2014,03,03,23,51), new Date(2014,03,03,23,56)],
["USER01", '', new Date(2014,03,03,16,11), new Date(2014,03,03,16,12)]
]);

  var options = {
    timeline: { colorByRowLabel: true }
  };

  chart.draw(dataTable, options);
}
<script type="text/javascript" src="https://www.google.com/jsapi?autoload={'modules':[{'name':'visualization',
       'version':'1','packages':['timeline']}]}"></script>
<div id="example5.1" style="width:5000px;height: 600px;"></div>
   
Lafferty answered 16/3, 2015 at 15:42 Comment(2)
Your examples are confined to a single day. Can sessions span multiple days? Also: Have you considered upgrading to a current version= Postgres 8.4 has reached EOL last year.Histolysis
Yes, sessions can span multiple days and yes we planned to upgrade to postgres 9.x in the next few months.Lafferty
H
6

I would serialize logins and logouts with UNION ALL, "in" counts as 1, "out" counts as -1. Then compute a running count with a simple window function and get the max per day.

Since it has not been specified, assuming that:

  • "Concurrent" means at the same point in time (not just on the same day).
  • Sessions can span any range of time (i.e. multiple days, too).
  • Each user can only be online once at one point in time. So no grouping per user is needed in my solution.
  • Logout trumps login. If both occur at the same time, logout is counted first (leading to a lower concurrent number in corner cases).
WITH range AS (SELECT '2014-03-01'::date AS start_date  -- time range
                    , '2014-03-31'::date AS end_date)   -- inclusive bounds
, cte AS (
   SELECT *
   FROM   tbl, range r
   WHERE  login_date  <= r.end_date
   AND    logout_date >= r.start_date
   )
, ct AS (
   SELECT log_date, sum(ct) OVER (ORDER BY log_date, log_time, ct) AS session_ct
   FROM  (
      SELECT logout_date AS log_date, logout_time AS log_time, -1 AS ct FROM cte
      UNION ALL
      SELECT login_date, login_time, 1 FROM cte
      ) sub
   )
SELECT log_date, max(session_ct) AS max_sessions
FROM   ct, range r
WHERE  log_date BETWEEN r.start_date AND r.end_date  -- crop actual time range
GROUP  BY 1
ORDER  BY 1;

You might use the OVERLAPS operator in cte:

AND   (login_date, logout_date) OVERLAPS (r.start_date, r.end_date)

Details:

But that might not be a good idea because (per documentation):

Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.

Bold emphasis mine. The upper bound of your range would have to be the day after your desired time frame.

Explain

  • CTE are available since Postgres 8.4.

  • The 1st CTE range is just for convenience of providing the time range once.

  • The 2nd CTE cte selects only relevant rows: those that ...

    • start before or in the range
    • and end in or after the range
  • The 3rd CTE ct serializes "in" and "out" points with values of +/-1 and computes a running count with the aggregate function sum() used as window function. Those are available since Postgres 8.4.

  • In the final SELECT trim leading and trailing days and aggregate the maximum per day. Voilá.

SQL Fiddle for Postgres 9.6.
Postgres 8.4 is too old and not available any more, but should work the same. I added a rows to the test case - one spanning multiple days. Should make it more useful.

Notes

I would generally use timestamp instead of date and time. Same size, easier to handle. Or timestamptz if multiple time zones can be involved.

An index on (login_date, logout_date DESC) is instrumental for performance as a bare minimum.

Histolysis answered 16/3, 2015 at 16:4 Comment(9)
I think you made a typo in the where clause of cte AS (...), it should be WHERE login_date >= r.start_date AND log_out_date <= r.end_date Otherwise the result of the statement is empty... right?Lafferty
I've tested your query including my comment above against data between 2014-01-01 and 2014-12-31. Here you see an excerpt: 2014-01-02=21, 2014-01-03=32, 2014-01-04=29, 2014-01-06=48, ... 2014-07-08=854, 2014-07-09=850, ... 2014-09-23=569, 2014-09-24=564, ... 2014-12-24=18, 2014-12-29=14, 2014-12-30=11, 2014-12-31=0 The result looks like a bell shape with peak values in the middle of the year, scaling down to zero at the beginning/end of the year...Lafferty
@Mike: Fixed a typo in the column name, and the logic was off, too, but in a different way. I had designed it for a single day at first and did not adapt to the range correctly.Histolysis
still seems to be wrong with my data. my test value is 6 for 2014-03-03 and with your script it's 709. Somehow accumulating the sessions =/Lafferty
I finally got through to SQL Fiddle to demonstrate it works as advertised. Can't test on pg 8.4, but should be the same. I suspect you have unexpected data in your table, like long running sessions.Histolysis
a big thank you for your help! you are right, I really had some long running sessions in the database. thanks also for the SQL Fiddle to demonstrate your solution.Lafferty
@ErwinBrandstetter This works fine, but in my case, two sessions can only be considered as concurrent if they overlap for at least 30 minutes. Any idea on how this query can be modified to include that?Vibraphone
@AthulDilip: That still leaves room for interpretation. Two sessions might overlap for 30 min, and a third session might overlap for 30 min with only one of those etc. How would you count exactly? I suggest to start a new question with exact definition. you can always link to this post for reference and drop a comment here to link back.Histolysis
Thanks @ErwinBrandstetter I've added a new question hereVibraphone
L
0

My thoughts so far:

  • First, find all possible overlaps between sessions (ie. "inner join" and overlap condition "(s1.login_time, s1.logout_time) OVERLAPS (s2.login_time, s2.logout_time)")
  • Based on smallest common timespan find max number of concurrent sessions (see last part of where clause "s1.login_time >= s2.login_time AND s1.logout_time <= s2.logout_time")

The SQL statement looks as follows:

SELECT report_date, MAX(concurrent_sessions) AS max_concurrent_sessions FROM(
  SELECT report_date, session_id, count(session_id) as concurrent_sessions from (
    SELECT s1.id AS session_id, s1.user_id, s1.login_date AS report_date, s1.login_time, s1.logout_date, s1.logout_time, s2.id, s2.user_id, s2.    login_date, s2.login_time, s2.logout_date, s2.logout_time 
    FROM sessions s1
    INNER JOIN sessions s2 ON s1.login_date = s2.login_date
    WHERE s1.login_date between '2014-03-01' AND '2014-03-31' AND (s1.login_time, s1.logout_time) OVERLAPS (s2.login_time, s2.logout_time) AND s1.    login_time >= s2.login_time AND s1.logout_time <= s2.logout_time
    ORDER BY s1.id
  ) AS concurrent_overlapping_sessions 
  GROUP BY report_date, session_id 
) AS max_concurrent_overlapping_sessions
GROUP BY report_date
ORDER BY report_date

What do you think about this solution in comparison to the other proposed one (e.g. performance, correctness, etc.)?

Lafferty answered 16/3, 2015 at 16:57 Comment(1)
This approach is incorrect in multiple ways. 1. Data selection has to include sessions starting before the time frame and ending after the time frame. 2. You are counting overlapping time ranges for each row, but that does not mean all (or even any) of these ranges overlap at the same point in time. Consequently, the result is completely wrong. 3. Very expensive, too. With long sessions, this creates a huge Cartesian product. But that's hardly relevant at this point, the whole approach is not getting you where you want to go.Histolysis

© 2022 - 2024 — McMap. All rights reserved.