Count unique ids in a rolling time frame
Asked Answered
T

1

8

I have a simple table as bellow with lots of IDs and dates.

ID      Date
10R46   2014-11-23  
10R46   2016-04-11  
100R9   2016-12-21
10R91   2013-05-03 
...     ...

I want to formulate a query which counts the unique IDs for a rolling time frame of dates, for example ten days. Meaning that for each date it should give me the number of unique IDs between that date and 10 days back. Result should look something like this.

UniqueTenDays    Date
200              2014-11-23 
324              2014-11-24 
522              2014-11-25
532              2014-11-26 
...              ...

Something along the lines of the bellow but I realise I need to apply the WHERE clause and count the IDs for each Date somehow.

SELECT Date, COUNT(DISTINCT ID)
FROM T 
WHERE Date BETWEEN DATE_SUB(Date, INTERVAL 10 DAY) AND Date
GROUP BY Date
ORDER BY Date

Thanks in advance.

Tressietressure answered 5/12, 2017 at 17:0 Comment(0)
L
6

Below is for BigQuery Standard SQL

#standardSQL
WITH temp1 AS (
  SELECT dt, STRING_AGG(DISTINCT id) AS users
  FROM `project.dataset.yourtable`
  GROUP BY dt
), temp2 AS (
  SELECT
    dt, 
    STRING_AGG(users) OVER(ORDER BY UNIX_DATE(dt) RANGE BETWEEN 10 PRECEDING AND CURRENT ROW) users
  FROM temp1
)
SELECT dt, 
  (SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) UniqueTenDays
FROM temp2

you can test / play with it using dummy data as below

#standardSQL
WITH `project.dataset.yourtable` AS (
  SELECT '10R46' id,  DATE '2014-11-23' dt UNION ALL  
  SELECT '10R46',     DATE '2016-04-11' UNION ALL  
  SELECT '10R46',     DATE '2016-04-12' UNION ALL  
  SELECT '10R47',     DATE '2016-04-13' UNION ALL  
  SELECT '10R48',     DATE '2016-04-14' UNION ALL  
  SELECT '100R9',     DATE '2016-12-21' UNION ALL
  SELECT '10R91',     DATE '2013-05-03'
), temp1 AS (
  SELECT dt, STRING_AGG(DISTINCT id) AS users
  FROM `project.dataset.yourtable`
  GROUP BY dt
), temp2 AS (
  SELECT
    dt, 
    STRING_AGG(users) OVER(ORDER BY UNIX_DATE(dt) RANGE BETWEEN 10 PRECEDING AND CURRENT ROW) users
  FROM temp1
)
SELECT dt,  
  (SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) UniqueTenDays
FROM temp2
Latta answered 5/12, 2017 at 17:42 Comment(4)
Thank you very much Mikhail, this works great. You definitely introduced me to a bunch of new concepts that I've been reading up on the past hour but it all makes sense.Tressietressure
This works great, but it gets really slow after temp2 completes when trying to do this on a bigger data set. Is there a way to make the last query where the ids get unnested more performant?Durkin
try using APPROX_COUNT_DISTINCTLatta
Wow, game changer. Thank you, and thanks for the fast reply!Durkin

© 2022 - 2024 — McMap. All rights reserved.