SQL to determine minimum sequential days of access?
Asked Answered
F

19

128

The following User History table contains one record for every day a given user has accessed a website (in a 24 hour UTC period). It has many thousands of records, but only one record per day per user. If the user has not accessed the website for that day, no record will be generated.

Id      UserId   CreationDate
------  ------   ------------
750997      12   2009-07-07 18:42:20.723
750998      15   2009-07-07 18:42:20.927
751000      19   2009-07-07 18:42:22.283

What I'm looking for is a SQL query on this table with good performance, that tells me which userids have accessed the website for (n) continuous days without missing a day.

In other words, how many users have (n) records in this table with sequential (day-before, or day-after) dates? If any day is missing from the sequence, the sequence is broken and should restart again at 1; we're looking for users who have achieved a continuous number of days here with no gaps.

Any resemblance between this query and a particular Stack Overflow badge is purely coincidental, of course.. :)

Festoonery answered 24/7, 2009 at 6:21 Comment(2)
I got the enthusiast badge after 28(<30) days membership. Mysticism.Byproduct
Are your date's stored as UTC? If so, what happens if a CA resident visits the site at 8am one day and then 8pm the following day? Although he/she visits on consecutive days in the Pacific Time Zone it wouldn't be recorded as such in the DB because the DB is storing times as UTC.Rickart
R
70

The answer is obviously:

SELECT DISTINCT UserId
FROM UserHistory uh1
WHERE (
       SELECT COUNT(*) 
       FROM UserHistory uh2 
       WHERE uh2.CreationDate 
       BETWEEN uh1.CreationDate AND DATEADD(d, @days, uh1.CreationDate)
      ) = @days OR UserId = 52551

EDIT:

Okay here's my serious answer:

DECLARE @days int
DECLARE @seconds bigint
SET @days = 30
SET @seconds = (@days * 24 * 60 * 60) - 1
SELECT DISTINCT UserId
FROM (
    SELECT uh1.UserId, Count(uh1.Id) as Conseq
    FROM UserHistory uh1
    INNER JOIN UserHistory uh2 ON uh2.CreationDate 
        BETWEEN uh1.CreationDate AND 
            DATEADD(s, @seconds, DATEADD(dd, DATEDIFF(dd, 0, uh1.CreationDate), 0))
        AND uh1.UserId = uh2.UserId
    GROUP BY uh1.Id, uh1.UserId
    ) as Tbl
WHERE Conseq >= @days

EDIT:

[Jeff Atwood] This is a great fast solution and deserves to be accepted, but Rob Farley's solution is also excellent and arguably even faster (!). Please check it out too!

Reviel answered 24/7, 2009 at 6:54 Comment(9)
@Artem: That was what I initially thought but when I thought about it, if you have an index on (UserId, CreationDate), the records will show up consecutively in the index and it should perform well.Viscoid
Upvote for this one, I'm getting results back in ~15 seconds on 500k rows.Davao
Great work Spencer! The query is very very fast and surprisingly simple; ~1 sec versus the ~19 sec of our old query. The mis-count was due to some data errors I had to clean up; once I did that I get a perfect match to our existing (thought to be correct, but slow) query.Festoonery
Truncate the CreateionDate down to days in all these tests (on the right side only or you kill SARG) using DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) This works by subtracting the supplied date from zero--which Microsoft SQL Server interprets as 1900-01-01 00:00:00 and gives the number of days. This value is then re-added to the zero date yielding the same date with the time truncated.Ciapas
Are you thinking in terms of literal days or a timespan of 30 days? I assumed the timespan. IE: 1st login at 1/1/2009 1:00pm and 30th login at 1/30/2009 2:00pm != 30 consecutive days because that's 30 days + 1 hour. Or am I missing the problem?Reviel
all I can tell you is, without IDisposable's change the calculation is incorrect. I personally validated the data myself. Some users with 1 day gaps WOULD get the badge incorrectly.Festoonery
you can also do CAST(CAST(CreationDate AS int) AS DateTime) to truncate the time, as date/time can be represented in sql as in integer number of days, and a fraction of a day.Wikiup
This query has the potential to miss a visit that occurs at 23:59:59.5 - how about changing it to: ON uh2.CreationDate >= uh1.CreationDate AND uh2.CreationDate < DATEADD(dd, DATEDIFF(dd, 0, uh1.CreationDate) + @days, 0), to mean "Not yet on the 31st day later". Also means you can skip the @seconds calculation.Feather
Just to clarify: for each row of the table, this counts the number of rows that fall less than 30 days in the future for the same user. If that count is 30, it means that there were no interruptions.Plaice
F
150

How about (and please make sure the previous statement ended with a semi-colon):

WITH numberedrows
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY UserID 
                                       ORDER BY CreationDate)
                - DATEDIFF(day,'19000101',CreationDate) AS TheOffset,
                CreationDate,
                UserID
         FROM   tablename)
SELECT MIN(CreationDate),
       MAX(CreationDate),
       COUNT(*) AS NumConsecutiveDays,
       UserID
FROM   numberedrows
GROUP  BY UserID,
          TheOffset  

The idea being that if we have list of the days (as a number), and a row_number, then missed days make the offset between these two lists slightly bigger. So we're looking for a range that has a consistent offset.

You could use "ORDER BY NumConsecutiveDays DESC" at the end of this, or say "HAVING count(*) > 14" for a threshold...

I haven't tested this though - just writing it off the top of my head. Hopefully works in SQL2005 and on.

...and would be very much helped by an index on tablename(UserID, CreationDate)

Edited: Turns out Offset is a reserved word, so I used TheOffset instead.

Edited: The suggestion to use COUNT(*) is very valid - I should've done that in the first place but wasn't really thinking. Previously it was using datediff(day, min(CreationDate), max(CreationDate)) instead.

Rob

Feather answered 24/7, 2009 at 7:37 Comment(10)
invalid column name "Offset". Doesn't like that grouping.Festoonery
jeff, this works on my sql 2008 just fine. maybe you have a case sensitivty problem?Ress
oh you should also add ; before with -> ;withRess
Mladen - no, you should end the previous statement with a semi-colon. ;) Jeff - Ok, put [Offset] instead. I guess Offset is a reserved word. Like I said, I hadn't tested it.Feather
Actually, cast(getdate() as int) has rounding hassles. Maybe subtract 0.5?Feather
@Tundey: the trick is the offset between date day and rownumber, not the tde. The query can be expressed just as well using a table expression instead of tde.Reardon
Just repeating myself, because this is a oft seen issue. Truncate the CreateionDate down to days in all these tests (on the right side only or you kill SARG) using DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) This works by subtracting the supplied date from zero--which Microsoft SQL Server interprets as 1900-01-01 00:00:00 and gives the number of days. This value is then re-added to the zero date yielding the same date with the time truncated.Ciapas
IDisposable - yup, I do that often myself. I just didn't worry about it doing it here. It wouldn't be any faster than casting it to an int, but does have the flexibility to count hours, months, whatever.Feather
I've just written a blog post about solving this with DENSE_RANK() too. tinyurl.com/denserankFeather
Just to clarify: every time a gap appears in the date sequence, it changes the value of TheOffset. Missing one day increase TheOffset by 1; missing two days, by two, etc; hence all days in each consecutive days run has the same value of TheOffset.Plaice
R
70

The answer is obviously:

SELECT DISTINCT UserId
FROM UserHistory uh1
WHERE (
       SELECT COUNT(*) 
       FROM UserHistory uh2 
       WHERE uh2.CreationDate 
       BETWEEN uh1.CreationDate AND DATEADD(d, @days, uh1.CreationDate)
      ) = @days OR UserId = 52551

EDIT:

Okay here's my serious answer:

DECLARE @days int
DECLARE @seconds bigint
SET @days = 30
SET @seconds = (@days * 24 * 60 * 60) - 1
SELECT DISTINCT UserId
FROM (
    SELECT uh1.UserId, Count(uh1.Id) as Conseq
    FROM UserHistory uh1
    INNER JOIN UserHistory uh2 ON uh2.CreationDate 
        BETWEEN uh1.CreationDate AND 
            DATEADD(s, @seconds, DATEADD(dd, DATEDIFF(dd, 0, uh1.CreationDate), 0))
        AND uh1.UserId = uh2.UserId
    GROUP BY uh1.Id, uh1.UserId
    ) as Tbl
WHERE Conseq >= @days

EDIT:

[Jeff Atwood] This is a great fast solution and deserves to be accepted, but Rob Farley's solution is also excellent and arguably even faster (!). Please check it out too!

Reviel answered 24/7, 2009 at 6:54 Comment(9)
@Artem: That was what I initially thought but when I thought about it, if you have an index on (UserId, CreationDate), the records will show up consecutively in the index and it should perform well.Viscoid
Upvote for this one, I'm getting results back in ~15 seconds on 500k rows.Davao
Great work Spencer! The query is very very fast and surprisingly simple; ~1 sec versus the ~19 sec of our old query. The mis-count was due to some data errors I had to clean up; once I did that I get a perfect match to our existing (thought to be correct, but slow) query.Festoonery
Truncate the CreateionDate down to days in all these tests (on the right side only or you kill SARG) using DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) This works by subtracting the supplied date from zero--which Microsoft SQL Server interprets as 1900-01-01 00:00:00 and gives the number of days. This value is then re-added to the zero date yielding the same date with the time truncated.Ciapas
Are you thinking in terms of literal days or a timespan of 30 days? I assumed the timespan. IE: 1st login at 1/1/2009 1:00pm and 30th login at 1/30/2009 2:00pm != 30 consecutive days because that's 30 days + 1 hour. Or am I missing the problem?Reviel
all I can tell you is, without IDisposable's change the calculation is incorrect. I personally validated the data myself. Some users with 1 day gaps WOULD get the badge incorrectly.Festoonery
you can also do CAST(CAST(CreationDate AS int) AS DateTime) to truncate the time, as date/time can be represented in sql as in integer number of days, and a fraction of a day.Wikiup
This query has the potential to miss a visit that occurs at 23:59:59.5 - how about changing it to: ON uh2.CreationDate >= uh1.CreationDate AND uh2.CreationDate < DATEADD(dd, DATEDIFF(dd, 0, uh1.CreationDate) + @days, 0), to mean "Not yet on the 31st day later". Also means you can skip the @seconds calculation.Feather
Just to clarify: for each row of the table, this counts the number of rows that fall less than 30 days in the future for the same user. If that count is 30, it means that there were no interruptions.Plaice
V
18

If you can change the table schema, I'd suggest adding a column LongestStreak to the table which you'd set to the number of sequential days ending to the CreationDate. It's easy to update the table at login time (similar to what you are doing already, if no rows exist of the current day, you'll check if any row exists for the previous day. If true, you'll increment the LongestStreak in the new row, otherwise, you'll set it to 1.)

The query will be obvious after adding this column:

if exists(select * from table
          where LongestStreak >= 30 and UserId = @UserId)
   -- award the Woot badge.
Viscoid answered 24/7, 2009 at 6:35 Comment(5)
+1 I was having a similar thought, but with a bit field (IsConsecutive) that would be 1 if there is a record for the previous day, otherwise 0.Preoccupancy
we're not going to change the schema for thisFestoonery
And the IsConsecutive can be a computed column defined in the UserHistory table. You could also make it a materialized (stored) computed column that is created when the row is inserted IFF (if and ONLY if) you always insert the rows in chronological order.Ciapas
(because NOBODY would do a SELECT *, we know adding this computed column will not affect the query plans unless the column is referenced... right guys?!?)Ciapas
it is definitely a valid solution but it's not what I asked for. So I give it a "thumbs sideways"..Festoonery
I
7

Some nicely expressive SQL along the lines of:

select
        userId,
    dbo.MaxConsecutiveDates(CreationDate) as blah
from
    dbo.Logins
group by
    userId

Assuming you have a user defined aggregate function something along the lines of (beware this is buggy):

using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Runtime.InteropServices;

namespace SqlServerProject1
{
    [StructLayout(LayoutKind.Sequential)]
    [Serializable]
    internal struct MaxConsecutiveState
    {
        public int CurrentSequentialDays;
        public int MaxSequentialDays;
        public SqlDateTime LastDate;
    }

    [Serializable]
    [SqlUserDefinedAggregate(
        Format.Native,
        IsInvariantToNulls = true, //optimizer property
        IsInvariantToDuplicates = false, //optimizer property
        IsInvariantToOrder = false) //optimizer property
    ]
    [StructLayout(LayoutKind.Sequential)]
    public class MaxConsecutiveDates
    {
        /// <summary>
        /// The variable that holds the intermediate result of the concatenation
        /// </summary>
        private MaxConsecutiveState _intermediateResult;

        /// <summary>
        /// Initialize the internal data structures
        /// </summary>
        public void Init()
        {
            _intermediateResult = new MaxConsecutiveState { LastDate = SqlDateTime.MinValue, CurrentSequentialDays = 0, MaxSequentialDays = 0 };
        }

        /// <summary>
        /// Accumulate the next value, not if the value is null
        /// </summary>
        /// <param name="value"></param>
        public void Accumulate(SqlDateTime value)
        {
            if (value.IsNull)
            {
                return;
            }
            int sequentialDays = _intermediateResult.CurrentSequentialDays;
            int maxSequentialDays = _intermediateResult.MaxSequentialDays;
            DateTime currentDate = value.Value.Date;
            if (currentDate.AddDays(-1).Equals(new DateTime(_intermediateResult.LastDate.TimeTicks)))
                sequentialDays++;
            else
            {
                maxSequentialDays = Math.Max(sequentialDays, maxSequentialDays);
                sequentialDays = 1;
            }
            _intermediateResult = new MaxConsecutiveState
                                      {
                                          CurrentSequentialDays = sequentialDays,
                                          LastDate = currentDate,
                                          MaxSequentialDays = maxSequentialDays
                                      };
        }

        /// <summary>
        /// Merge the partially computed aggregate with this aggregate.
        /// </summary>
        /// <param name="other"></param>
        public void Merge(MaxConsecutiveDates other)
        {
            // add stuff for two separate calculations
        }

        /// <summary>
        /// Called at the end of aggregation, to return the results of the aggregation.
        /// </summary>
        /// <returns></returns>
        public SqlInt32 Terminate()
        {
            int max = Math.Max((int) ((sbyte) _intermediateResult.CurrentSequentialDays), (sbyte) _intermediateResult.MaxSequentialDays);
            return new SqlInt32(max);
        }
    }
}
Induration answered 24/7, 2009 at 10:13 Comment(0)
G
5

Seems like you could take advantage of the fact that to be continuous over n days would require there to be n rows.

So something like:

SELECT users.UserId, count(1) as cnt
FROM users
WHERE users.CreationDate > now() - INTERVAL 30 DAY
GROUP BY UserId
HAVING cnt = 30
Guerdon answered 24/7, 2009 at 6:47 Comment(3)
yes, we can gate it by the number of records, for sure.. but that only eliminates some possibilities, as we could have 120 days of visiting across several years with lots of daily gapsFestoonery
Okay, but once you're caught up on the awarding of this page, you only need to run it once per day. I think for that case, something like above would do the trick. To catch up, all you need to do is turn the WHERE clause into a sliding window using BETWEEN.Guerdon
each run of the task is stateless and standalone; it has no knowledge of prior runs other than the table in the questionFestoonery
A
4

Doing this with a single SQL query seems overly complicated to me. Let me break this answer down in two parts.

  1. What you should have done until now and should start doing now:
    Run a daily cron job that checks for every user wether he has logged in today and then increments a counter if he has or sets it to 0 if he hasn't.
  2. What you should do now:
    - Export this table to a server that doesn't run your website and won't be needed for a while. ;)
    - Sort it by user, then date.
    - go through it sequentially, keep a counter...
Ardeb answered 24/7, 2009 at 6:37 Comment(1)
we can write code to query-and-loop, that's .. dary I say.. trivial. I'm curious about the SQL only way at the moment.Festoonery
L
3

You could use a recursive CTE (SQL Server 2005+):

WITH recur_date AS (
        SELECT t.userid,
               t.creationDate,
               DATEADD(day, 1, t.created) 'nextDay',
               1 'level' 
          FROM TABLE t
         UNION ALL
        SELECT t.userid,
               t.creationDate,
               DATEADD(day, 1, t.created) 'nextDay',
               rd.level + 1 'level'
          FROM TABLE t
          JOIN recur_date rd on t.creationDate = rd.nextDay AND t.userid = rd.userid)
   SELECT t.*
    FROM recur_date t
   WHERE t.level = @numDays
ORDER BY t.userid
Lyndalynde answered 24/7, 2009 at 7:6 Comment(0)
C
3

Joe Celko has a complete chapter on this in SQL for Smarties (calling it Runs and Sequences). I don't have that book at home, so when I get to work... I'll actually answer this. (assuming history table is called dbo.UserHistory and the number of days is @Days)

Another lead is from SQL Team's blog on runs

The other idea I've had, but don't have a SQL server handy to work on here is to use a CTE with a partitioned ROW_NUMBER like this:

WITH Runs
AS
  (SELECT UserID
         , CreationDate
         , ROW_NUMBER() OVER(PARTITION BY UserId
                             ORDER BY CreationDate)
           - ROW_NUMBER() OVER(PARTITION BY UserId, NoBreak
                               ORDER BY CreationDate) AS RunNumber
  FROM
     (SELECT UH.UserID
           , UH.CreationDate
           , ISNULL((SELECT TOP 1 1 
              FROM dbo.UserHistory AS Prior 
              WHERE Prior.UserId = UH.UserId 
              AND Prior.CreationDate
                  BETWEEN DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), -1)
                  AND DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), 0)), 0) AS NoBreak
      FROM dbo.UserHistory AS UH) AS Consecutive
)
SELECT UserID, MIN(CreationDate) AS RunStart, MAX(CreationDate) AS RunEnd
FROM Runs
GROUP BY UserID, RunNumber
HAVING DATEDIFF(dd, MIN(CreationDate), MAX(CreationDate)) >= @Days

The above is likely WAY HARDER than it has to be, but left as an a brain tickle for when you have some other definition of "a run" than just dates.

Ciapas answered 24/7, 2009 at 8:14 Comment(0)
P
3

A couple of SQL Server 2012 options (assuming N=100 below).

;WITH T(UserID, NRowsPrevious)
     AS (SELECT UserID,
                DATEDIFF(DAY, 
                        LAG(CreationDate, 100) 
                            OVER 
                                (PARTITION BY UserID 
                                     ORDER BY CreationDate), 
                         CreationDate)
         FROM   UserHistory)
SELECT DISTINCT UserID
FROM   T
WHERE  NRowsPrevious = 100 

Though with my sample data the following worked out more efficient

;WITH U
         AS (SELECT DISTINCT UserId
             FROM   UserHistory) /*Ideally replace with Users table*/
    SELECT UserId
    FROM   U
           CROSS APPLY (SELECT TOP 1 *
                        FROM   (SELECT 
                                       DATEDIFF(DAY, 
                                                LAG(CreationDate, 100) 
                                                  OVER 
                                                   (ORDER BY CreationDate), 
                                                 CreationDate)
                                FROM   UserHistory UH
                                WHERE  U.UserId = UH.UserID) T(NRowsPrevious)
                        WHERE  NRowsPrevious = 100) O

Both rely on the constraint stated in the question that there is at most one record per day per user.

Pagandom answered 5/1, 2012 at 21:9 Comment(0)
C
2

If this is so important to you, source this event and drive a table to give you this info. No need to kill the machine with all those crazy queries.

Conker answered 24/7, 2009 at 6:55 Comment(0)
M
1

I used a simple math property to identify who consecutively accessed the site. This property is that you should have the day difference between the first time access and last time equal to number of records in your access table log.

Here are SQL script that I tested in Oracle DB (it should work in other DBs as well):

-- show basic understand of the math properties 
  select    ceil(max (creation_date) - min (creation_date))
              max_min_days_diff,
           count ( * ) real_day_count
    from   user_access_log
group by   user_id;


-- select all users that have consecutively accessed the site 
  select   user_id
    from   user_access_log
group by   user_id
  having       ceil(max (creation_date) - min (creation_date))
           / count ( * ) = 1;



-- get the count of all users that have consecutively accessed the site 
  select   count(user_id) user_count
    from   user_access_log
group by   user_id
  having   ceil(max (creation_date) - min (creation_date))
           / count ( * ) = 1;

Table prep script:

-- create table 
create table user_access_log (id           number, user_id      number, creation_date date);


-- insert seed data 
insert into user_access_log (id, user_id, creation_date)
  values   (1, 12, sysdate);

insert into user_access_log (id, user_id, creation_date)
  values   (2, 12, sysdate + 1);

insert into user_access_log (id, user_id, creation_date)
  values   (3, 12, sysdate + 2);

insert into user_access_log (id, user_id, creation_date)
  values   (4, 16, sysdate);

insert into user_access_log (id, user_id, creation_date)
  values   (5, 16, sysdate + 1);

insert into user_access_log (id, user_id, creation_date)
  values   (6, 16, sysdate + 5);
Musso answered 24/7, 2009 at 7:13 Comment(0)
C
1

Something like this?

select distinct userid
from table t1, table t2
where t1.UserId = t2.UserId 
  AND trunc(t1.CreationDate) = trunc(t2.CreationDate) + n
  AND (
    select count(*)
    from table t3
    where t1.UserId  = t3.UserId
      and CreationDate between trunc(t1.CreationDate) and trunc(t1.CreationDate)+n
   ) = n
Cardiganshire answered 24/7, 2009 at 7:13 Comment(0)
V
1
declare @startdate as datetime, @days as int
set @startdate = cast('11 Jan 2009' as datetime) -- The startdate
set @days = 5 -- The number of consecutive days

SELECT userid
      ,count(1) as [Number of Consecutive Days]
FROM UserHistory
WHERE creationdate >= @startdate
AND creationdate < dateadd(dd, @days, cast(convert(char(11), @startdate, 113)  as datetime))
GROUP BY userid
HAVING count(1) >= @days

The statement cast(convert(char(11), @startdate, 113) as datetime) removes the time part of the date so we start at midnight.

I would assume also that the creationdate and userid columns are indexed.

I just realized that this won't tell you all the users and their total consecutive days. But will tell you which users will have been visiting a set number of days from a date of your choosing.

Revised solution:

declare @days as int
set @days = 30
select t1.userid
from UserHistory t1
where (select count(1) 
       from UserHistory t3 
       where t3.userid = t1.userid
       and t3.creationdate >= DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate), 0) 
       and t3.creationdate < DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate) + @days, 0) 
       group by t3.userid
) >= @days
group by t1.userid

I've checked this and it will query for all users and all dates. It is based on Spencer's 1st (joke?) solution, but mine works.

Update: improved the date handling in the second solution.

Vouvray answered 24/7, 2009 at 9:3 Comment(1)
close, but we need something that works for any (n) day period, not on a fixed start dateFestoonery
K
0

This should do what you want but I don't have enough data to test efficiency. The convoluted CONVERT/FLOOR stuff is to strip the time portion off the datetime field. If you're using SQL Server 2008 then you could use CAST(x.CreationDate AS DATE).

DECLARE @Range as INT
SET @Range = 10

SELECT DISTINCT UserId, CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate)))
  FROM tblUserLogin a
WHERE EXISTS
   (SELECT 1 
      FROM tblUserLogin b 
     WHERE a.userId = b.userId 
       AND (SELECT COUNT(DISTINCT(CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, CreationDate))))) 
              FROM tblUserLogin c 
             WHERE c.userid = b.userid 
               AND CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, c.CreationDate))) BETWEEN CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate))) and CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate)))+@Range-1) = @Range)

Creation script

CREATE TABLE [dbo].[tblUserLogin](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [UserId] [int] NULL,
    [CreationDate] [datetime] NULL
) ON [PRIMARY]
Kenlay answered 24/7, 2009 at 6:52 Comment(2)
pretty brutal. 26 seconds across 406,624 rows.Festoonery
How often are you checking to award the badge? If it's only once a day then a 26 second hit in a slow period doesn't seem that bad. Albeit, performance will slow down as the table grows. After re-reading the question stripping the time may not be relevant as there's only one record per day.Kenlay
B
0

Tweaking Bill's query a bit. You might have to truncate the date before grouping to count only one login per day...

SELECT UserId from History 
WHERE CreationDate > ( now() - n )
GROUP BY UserId, 
DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) AS TruncatedCreationDate  
HAVING COUNT(TruncatedCreationDate) >= n

EDITED to use DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) instead of convert( char(10) , CreationDate, 101 ).

@IDisposable I was looking to use datepart earlier but i was too lazy to look up the syntax so i figured i d use convert instead. I dint know it had a significant impact Thanks! now i know.

Bloomer answered 24/7, 2009 at 7:4 Comment(2)
Truncating a SQL DATETIME to date-only is best done with DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), 0)Ciapas
(the above works by taking the difference in whole days between 0 (e.g. 1900-01-01 00:00:00.000) and then adding that difference in whole days back to 0 (e.g. 1900-01-01 00:00:00). This results in the time portion of the DATETIME being discarded)Ciapas
L
0

Spencer almost did it, but this should be the working code:

SELECT DISTINCT UserId
FROM History h1
WHERE (
    SELECT COUNT(*) 
    FROM History
    WHERE UserId = h1.UserId AND CreationDate BETWEEN h1.CreationDate AND DATEADD(d, @n-1, h1.CreationDate)
) >= @n
Labonte answered 24/7, 2009 at 7:11 Comment(0)
G
0

Off the top of my head, MySQLish:

SELECT start.UserId
FROM UserHistory AS start
  LEFT OUTER JOIN UserHistory AS pre_start ON pre_start.UserId=start.UserId
    AND DATE(pre_start.CreationDate)=DATE_SUB(DATE(start.CreationDate), INTERVAL 1 DAY)
  LEFT OUTER JOIN UserHistory AS subsequent ON subsequent.UserId=start.UserId
    AND DATE(subsequent.CreationDate)<=DATE_ADD(DATE(start.CreationDate), INTERVAL 30 DAY)
WHERE pre_start.Id IS NULL
GROUP BY start.Id
HAVING COUNT(subsequent.Id)=30

Untested, and almost certainly needs some conversion for MSSQL, but I think that give some ideas.

Goldiegoldilocks answered 24/7, 2009 at 7:25 Comment(0)
H
0

How about one using Tally tables? It follows a more algorithmic approach, and execution plan is a breeze. Populate the tallyTable with numbers from 1 to 'MaxDaysBehind' that you want to scan the table (ie. 90 will look for 3 months behind,etc).

declare @ContinousDays int
set @ContinousDays = 30  -- select those that have 30 consecutive days

create table #tallyTable (Tally int)
insert into #tallyTable values (1)
...
insert into #tallyTable values (90) -- insert numbers for as many days behind as you want to scan

select [UserId],count(*),t.Tally from HistoryTable 
join #tallyTable as t on t.Tally>0
where [CreationDate]> getdate()[email protected] and 
      [CreationDate]<getdate()-t.Tally 
group by [UserId],t.Tally 
having count(*)>=@ContinousDays

delete #tallyTable
Hardboard answered 24/7, 2009 at 7:44 Comment(0)
F
0

assuming a schema that goes like:

create table dba.visits
(
    id  integer not null,
    user_id integer not null,
    creation_date date not null
);

this will extract contiguous ranges from a date sequence with gaps.

select l.creation_date  as start_d, -- Get first date in contiguous range
    (
        select min(a.creation_date ) as creation_date 
        from "DBA"."visits" a 
            left outer join "DBA"."visits" b on 
                   a.creation_date = dateadd(day, -1, b.creation_date ) and 
                   a.user_id  = b.user_id 
            where b.creation_date  is null and
                  a.creation_date  >= l.creation_date  and
                  a.user_id  = l.user_id 
    ) as end_d -- Get last date in contiguous range
from  "DBA"."visits" l
    left outer join "DBA"."visits" r on 
        r.creation_date  = dateadd(day, -1, l.creation_date ) and 
        r.user_id  = l.user_id 
    where r.creation_date  is null
Forepleasure answered 24/7, 2009 at 8:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.