Longest Consecutive Days Count for BigQuery

Asked 16/1, 2019 at 21:44 Answered 17/1, 2019 at 5:11

Solved mysql google-bigquery gaps-and-islands

Right now I just have an aggregate of how many days a user has worked. I'm trying to change this query to most continuous days worked.

Where u12345 would be 4 and u1 would be 2.

Is this possible to do with a BigQuery statement?

EDIT I am Kind of close with the following query but my u1 is getting 3 instead of 2.

SELECT MIN(e.timestamp) as date_created, e.uid, COUNT(e.uid) + 1 AS streak
FROM OnSite e
LEFT JOIN OnSite ee 
  ON e.uid = ee.uid 
AND DATE(e.timestamp) = DATE(DATE_ADD(ee.timestamp, INTERVAL -1 DAY))
WHERE ee.uid IS NOT NULL
GROUP BY e.uid;

Schema (MySQL v5.7)

CREATE TABLE OnSite
    (`uid` varchar(55), `worksite_id`  varchar(55), `timestamp` datetime)
;

INSERT INTO OnSite
    (`uid`, `worksite_id`, `timestamp`)
VALUES
  ("u12345", "worksite_1", '2019-01-01'),
  ("u12345", "worksite_1", '2019-01-02'),
  ("u12345", "worksite_1", '2019-01-03'),
  ("u12345", "worksite_1", '2019-01-04'),
  ("u12345", "worksite_1", '2019-01-06'),
  ("u1", "worksite_1", '2019-01-01'),
  ("u1", "worksite_1", '2019-01-02'),
  ("u1", "worksite_1", '2019-01-05'),
  ("u1", "worksite_1", '2019-01-06')

;

Query #1

SELECT    uid, COUNT(DISTINCT timestamp) Total
FROM      OnSite
GROUP BY  uid;

| uid    | Total |
| ------ | ----- |
| u1     | 4     |
| u12345 | 5     |

View on DB Fiddle

Kwang answered 16/1, 2019 at 21:44 Comment(11)

Maybe you need to process data after the query. I don't know query to do that. – Heavensent 16/1, 2019 at 21:53

@Cochu I added an edit that I'm getting close but I'm one number off – Kwang 16/1, 2019 at 21:55

u12345 would be 3, not 4 because "2019-01-04" doesn't have next day – Heavensent 16/1, 2019 at 22:3

The thought is that 2019-01-01 to 2019-01-04 is 4 consecutive days. So it would be 4 @cochu – Kwang 16/1, 2019 at 22:14

I think you will either need a stored procedure or some application code as to do this requires looping through all the data. The reason COUNT doesn't work is because your u1 has more than one streak, and days from both streaks are being counted in your result. – Myramyrah 16/1, 2019 at 22:52

@Myramyrah hmm, well my current result is just counting total days all together. But I see what you mean. Would it be possible to just list both streaks? It would suck to get 5 years worth of data for every single user and code a consecutive day streak – Kwang 16/1, 2019 at 22:55

please check my answer. It gives you expected result with no use of stored procedures – Ballast 16/1, 2019 at 23:0

It does seem @Simonare has proved me wrong as long as you are using MySQL 8 – Myramyrah 16/1, 2019 at 23:1

It appears so, unfortunately this was for BigQuery and it doesn't like it. But I never mentioned it @Myramyrah – Kwang 16/1, 2019 at 23:4

@Kwang If the answer doesn't actually solve your problem you should leave the question open so you can possibly get a solution which does work for you (you can just upvote the answer instead). I've added the bigquery tag to your question so that answers should be limited to code that works there. – Myramyrah 16/1, 2019 at 23:7

@Myramyrah I appreciate that nick, I've left the question open – Kwang 16/1, 2019 at 23:8

Below is for BigQuery Standard SQL

In case if you are interested in max consecutive days of the users on the same worksite:

#standardSQL
SELECT uid, MAX(consecuitive_days) max_consecuitive_days
FROM (
  SELECT uid, grp, COUNT(1) consecuitive_days
  FROM (
    SELECT uid, 
      COUNTIF(step > 1) OVER(PARTITION BY uid, worksite_id ORDER BY ts) grp
    FROM (
      SELECT uid, worksite_id, ts, 
        DATE_DIFF(ts, LAG(ts) OVER(PARTITION BY uid, worksite_id ORDER BY ts), DAY) step 
      FROM `project.dataset.table`
    )
  ) GROUP BY uid, grp
) GROUP BY uid

In case if worksite does not matter and you are looking just for max consecutive days:

#standardSQL
SELECT uid, MAX(consecuitive_days) max_consecuitive_days
FROM (
  SELECT uid, grp, COUNT(1) consecuitive_days
  FROM (
    SELECT uid, 
      COUNTIF(step > 1) OVER(PARTITION BY uid ORDER BY ts) grp
    FROM (
      SELECT uid, ts, 
        DATE_DIFF(ts, LAG(ts) OVER(PARTITION BY uid ORDER BY ts), DAY) step 
      FROM `project.dataset.table`
    )
  ) GROUP BY uid, grp
) GROUP BY uid

You can test, play any of above with he sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'u12345' uid, 'worksite_1' worksite_id, DATE '2019-01-01' ts UNION ALL
  SELECT 'u12345', 'worksite_1', '2019-01-02' UNION ALL
  SELECT 'u12345', 'worksite_1', '2019-01-03' UNION ALL
  SELECT 'u12345', 'worksite_1', '2019-01-04' UNION ALL
  SELECT 'u12345', 'worksite_1', '2019-01-06' UNION ALL
  SELECT 'u1', 'worksite_1', '2019-01-01' UNION ALL
  SELECT 'u1', 'worksite_1', '2019-01-02' UNION ALL
  SELECT 'u1', 'worksite_1', '2019-01-05' UNION ALL
  SELECT 'u1', 'worksite_1', '2019-01-06' 
)
SELECT uid, MAX(consecuitive_days) max_consecuitive_days
FROM (
  SELECT uid, grp, COUNT(1) consecuitive_days
  FROM (
    SELECT uid, 
      COUNTIF(step > 1) OVER(PARTITION BY uid ORDER BY ts) grp
    FROM (
      SELECT uid, ts, 
        DATE_DIFF(ts, LAG(ts) OVER(PARTITION BY uid ORDER BY ts), DAY) step 
      FROM `project.dataset.table`
    )
  ) GROUP BY uid, grp
) GROUP BY uid

with result:

Row uid     max_consecuitive_days    
1   u12345  4    
2   u1      2

Drape answered 17/1, 2019 at 5:11 Comment(3)

Very elegant solution. This improved my answer as well. – Ballast 17/1, 2019 at 8:12

Thank you Mikhail! This works exactly as expected. I did have a question about what you meant by for the users on the same worksite bit. I'm having trouble understanding what is going on here. It works a bit differently if I were to add FROM project.dataset.table WHERE worksite_id = 'x' (which might be the totally wrong way of focusing on one worksite) – Kwang 17/1, 2019 at 13:57

what I meant is - you can look for Longest Consecutive Days Count for each user no matter which worksite that user was working on. But another option would be to look for Longest Consecutive Days Count for each user with extra condition of being on the same worksite during those days. I do not know nature of your data and meaning of worksite - so provided you with two versions of query for respective options. hope this answers your follow up question :o) – Drape 17/1, 2019 at 19:32

does this fit for you?

set @gr=1;
select uid, max(cnt) max_cnt from (
  select uid, grp, count(*) cnt from (
    select uid, 
      case when ifnull(DATE_ADD(oldDate, INTERVAL 1 DAY), timestamp)= timestamp then 
        @gr 
      else  
        @gr := @gr +1 
      end grp
    from
    (
        SELECT    
          uid, 
          timestamp, 
          lag(timestamp) over (partition by uid order by timestamp asc) as oldDate
        FROM      OnSite
    ) t
  )t2
  group by uid, grp
  )t3
group by uid

Result

| uid    | max_cnt |
| ------ | ------- |
| u1     | 2       |
| u12345 | 4       |

DB Fiddle

Liverish answered 16/1, 2019 at 22:4 Comment(9)

I think there is some errors in the syntax, it's not running for me in the DB Fiddle – Kwang 16/1, 2019 at 22:5

I added my Db Fiddle – Ballast 16/1, 2019 at 22:10

Can you check the link? I'm just seeing my first query there – Kwang 16/1, 2019 at 22:11

Yeah, your DB Fiddle only make a group by – Heavensent 16/1, 2019 at 22:29

I am sorry, I forgot to fork :) I updated the link now – Ballast 16/1, 2019 at 22:32

I dont think this is giving me the right data back. u12345 would be 4 and u1 would be 2. – Kwang 16/1, 2019 at 22:40

Please check my answer. It now gives you expected result – Ballast 16/1, 2019 at 22:57

Awesome work Simonare, it works as expected! Thanks for looking into this. Unfortunately doesn't work for my BigQuery purposes but I never said anything about that to you so I've marked it as correct – Kwang 16/1, 2019 at 23:5

Still is a good answer even if it turns out not to be of use to OP. – Myramyrah 16/1, 2019 at 23:22

Recommended topics

Hot tags