Check for x consecutive days - given timestamps in database
Asked Answered
T

4

19

Could anybody give me an idea or hint how you could check for X consecutive days in a database table (MySQL) where logins (user id, timestamp) are stored?

Stackoverflow does it (e.g. badges like Enthusiast - if you log in for 30 consecutive days or so...). What functions would you have to use or what is the idea of how to do it?

Something like SELECT 1 FROM login_dates WHERE ...?

Tithe answered 18/7, 2012 at 10:13 Comment(2)
Do you mean back from today, or just a range?Hintz
@podiluska: like in general, that's the point :) so, not restricted to the last 30 days, but to see if there are 30 consecutive days at all...Tithe
U
35

You can accomplish this using a shifted self-outer-join in conjunction with a variable. See this solution:

SELECT IF(COUNT(1) > 0, 1, 0) AS has_consec
FROM
(
    SELECT *
    FROM
    (
        SELECT IF(b.login_date IS NULL, @val:=@val+1, @val) AS consec_set
        FROM tbl a
        CROSS JOIN (SELECT @val:=0) var_init
        LEFT JOIN tbl b ON 
            a.user_id = b.user_id AND
            a.login_date = b.login_date + INTERVAL 1 DAY
        WHERE a.user_id = 1
    ) a
    GROUP BY a.consec_set
    HAVING COUNT(1) >= 30
) a

This will return either a 1 or a 0 based on if a user has logged in for 30 consecutive days or more at ANYTIME in the past.

The brunt of this query is really in the first subselect. Let's take a closer look so we can better understand how this works:

With the following example data set:

CREATE TABLE tbl (
  user_id INT,
  login_date DATE
);

INSERT INTO tbl VALUES
(1, '2012-04-01'),  (2, '2012-04-02'),
(1, '2012-04-25'),  (2, '2012-04-03'),
(1, '2012-05-03'),  (2, '2012-04-04'),
(1, '2012-05-04'),  (2, '2012-05-04'),
(1, '2012-05-05'),  (2, '2012-05-06'),
(1, '2012-05-06'),  (2, '2012-05-08'),
(1, '2012-05-07'),  (2, '2012-05-09'),
(1, '2012-05-09'),  (2, '2012-05-11'),
(1, '2012-05-10'),  (2, '2012-05-17'),
(1, '2012-05-11'),  (2, '2012-05-18'),
(1, '2012-05-12'),  (2, '2012-05-19'),
(1, '2012-05-16'),  (2, '2012-05-20'),
(1, '2012-05-19'),  (2, '2012-05-21'),
(1, '2012-05-20'),  (2, '2012-05-22'),
(1, '2012-05-21'),  (2, '2012-05-25'),
(1, '2012-05-22'),  (2, '2012-05-26'),
(1, '2012-05-25'),  (2, '2012-05-27'),
                    (2, '2012-05-28'),
                    (2, '2012-05-29'),
                    (2, '2012-05-30'),
                    (2, '2012-05-31'),
                    (2, '2012-06-01'),
                    (2, '2012-06-02');

This query:

SELECT a.*, b.*, IF(b.login_date IS NULL, @val:=@val+1, @val) AS consec_set
FROM tbl a
CROSS JOIN (SELECT @val:=0) var_init
LEFT JOIN tbl b ON 
    a.user_id = b.user_id AND
    a.login_date = b.login_date + INTERVAL 1 DAY
WHERE a.user_id = 1

Will produce:

Example Result

As you can see, what we are doing is shifting the joined table by +1 day. For each day that is not consecutive with the prior day, a NULL value is generated by the LEFT JOIN.

Now that we know where the non-consecutive days are, we can use a variable to differentiate each set of consecutive days by detecting whether or not the shifted table's rows are NULL. If they are NULL, the days are not consecutive, so just increment the variable. If they are NOT NULL, then don't increment the variable:

Result With Highlighted Groupings

After we've differentiated each set of consecutive days with the incrementing variable, it's then just a simple matter of grouping by each "set" (as defined in the consec_set column) and using HAVING to filter out any set that has less than the specified consecutive days (30 in your example):

Result With Grouping By The consec_set column

Then finally, we wrap THAT query and simply count the number of sets that had 30 or more consecutive days. If there was one or more of these sets, then return 1, otherwise return 0.


See a SQLFiddle step-by-step demo

Untwine answered 18/7, 2012 at 12:34 Comment(5)
Not yet sure what happened with SQL Fiddle this morning, but it is back now.Victoriavictorian
wow, this is amazing! Thank you very much Zane for your effort! And for this comprehensive answer with the example and step-by-step guide! wow! thanks again! :)Tithe
This is one of the best answers on this site. I've give this a +10 if I could.Handicraftsman
but How can I select users who have top consec_count?Accentuation
@Zane-bien i need to reread your answer a couple of times, but it is also a start of a solution for my problem ( i think). In my case i need to find consecutive_days (5, 9 or 30) where a column value has the value P(resent) or S(tay). So not consecutive in the true sense. I just need to keep counting until there are 5 , 9 or 30 instances and spit out the corresponding values. Thing is your SQL Fiddle doesn't seem to work anymore??? I like to see this in action. I will ask a proper question myself, but any tips, pointers or links are highly appreciatedOccupational
C
5

You can add X to timestamp date and chech if distinct( dates ) in this date range is == X:

At least once every day of those 30 days:

SELECT distinct 1 
FROM 
   login_dates l1 
inner join
   login_dates l2
      on l1.user = l2.user and 
         l2.timestamp between l1.timestamp and  
                              date_add( l1.timestamp, Interval X day )
where l1.user = some_user
group by 
   DATE(l1.timestamp)
having 
   count( distinct DATE(l1.timestamp) ) = X

(You don't speack about performance requirements ... ;) )

* Edited * The query for only last X days: east once every day of those 30 days

SELECT distinct 1 
FROM 
   login_dates l1 
where l1.user = some_user
      and l1.timestamp >  date_add( CURDATE() , Interval -X day )
group by
    l1.user
having 
   count( distinct DATE(l1.timestamp) ) = X
Characterize answered 18/7, 2012 at 10:21 Comment(5)
wow, that looks complex :) talking about lets say the last 30 days, would this return results where a user logged in 30 times in the last 30 days too? Or just those who logged in at least once every day of those 30 days?Tithe
this query returns 1 if the user has been logged at least once every day of those 30 days. The query limited for last 30 days is really easy. I will post it.Characterize
oh wow, thank you danihp! you call this "really easy"? great that there are so many clever people on this planet! :)Tithe
so, probably it is better to check each time a login happens for that, so you sort out a lot of troubles, right? Because then you only have to check for the last 30 days...Tithe
second query is more and more light that first one. remember to create a composite index with user and timestamp fields.Characterize
M
1

That's a hard problem to solve with SQL alone.

The core of the problem is that you need to compare dynamic results sets to each other in one query. For example, you need to get all the logins/session IDs for one DATE, then JOIN or UNION them with a list to a grouping of logins from the DATE() (which you could use DATE_ADD to determine). You could do this for N number of consecutive dates. If you have any rows left, then those sessions have been logged in over that period.

Assume the following table:

sessionid int, created date

This query returns all the sessionids that have have rows for the last two days:

select t1.sessionid from logins t1 
  join logins t2 on t1.sessionid=t2.sessionid 
  where t1.created = DATE(date_sub(now(), interval 2 day)) 
    AND t2.created = DATE(date_sub(now(), interval 1 day));

As you can see, the SQL will get gnarly for 30 days. Have a script generate it. :-D

This further assumes that every day, the login table is updated with the session.

I don't know if this actually solves your problem, but I hope I have helped frame the problem.

Good luck.

Mariettemarigold answered 18/7, 2012 at 10:49 Comment(0)
C
1

Wouldn't it be more simple to have an extra column consecutive_days in login_dates table with default value 1. This would indicate the length of consecutive dates ending on that day.

You create an insert after trigger on login_dates where you check if there is an entry for the previous day.

If there is none, then the field would have the default value 1 meaning that a new sequence is started on that date.

If here is an entry for previous day then you change the days_logged_in value from the default 1 to be 1 greater then that of previous day.

Ex:

| date       | consecutive_days |
|------------|------------------|
| 2013-11-13 | 5                |
| 2013-11-14 | 6                |
| 2013-11-16 | 1                |
| 2013-11-17 | 2                |
| 2013-11-18 | 3                |
Coachman answered 18/11, 2013 at 12:7 Comment(4)
Whilst that may work for a new table and system being developed, it doesn't solve the general problem here which is querying data that already exists.Jurisprudence
This question is also over a year old with a very strong accepted answer already. Your input is very much appreciated but you are far more likely to get upvotes and reputation by answering recent questions or older questions with no accepted answer.Jurisprudence
I know it is an old topic, but I was searching for a solution for this same problem myself just before posting my solution. As I was building a new system my solution seemed much more appropriate. I published it as people might stumble upon this thread in the feature (as I did after a year and a half) and if they are building a new system or can easily change the one they are working on it, my solution is more simple and much much faster than the one accepted.Coachman
Yep I agree. Please understand that I didn't intend to come across as critical, I was just moderating your first post and wanted to offer a little guidance. Welcome to SO.Jurisprudence

© 2022 - 2024 — McMap. All rights reserved.