BigQuery select data within a time interval

Asked 27/4, 2015 at 14:53 Answered 8/2, 2018 at 20:38

my data looks like

name| From | To_City | Date of request

Andy| Paris | London| 08/21/2014 12:00

Lena | Koln | Berlin | 08/22/2014 18:00

Andy| Paris | London | 08/22/2014 06:00

Lisa | Rome | Neapel | 08/25/2014 18:00

Lena | Rome | London | 08/21/2014 20:00

Lisa | Rome | Neapel | 08/24/2014 18:00

Andy| Paris | London| 08/25/2014 12:00

I want to find how many identical drive requests a person had within +/- one day. I'd love to receive a table saying:

name| From | To_City | avg Date of request | # requests

Andy| Paris | London| 08/21/2014 21:00 | 2

Lena | Koln | Berlin | 08/22/2014 18:00 | 1

Lisa | Rome | Neapel | 08/25/2014 06:00 | 2

Lena | Rome | London | 08/21/2014 20:00 | 1

Andy| Paris | London| 08/25/2014 12:00 | 1

This would be the result of a group by clause. But is it in general feasible to write such a condition that would check whether and how many identical request there are within 24 hours of an initial request? By now I download the data in Excel and do it there but there is a lot of data and hence it is not efficient...

Sample data:

Let's build a sample dataset first:

select * from (select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date)

Mejia answered 27/4, 2015 at 14:53 Comment(0)

One way to do it is to use window functions with the RANGE window. In order to do that, first dates need to be converted to days because RANGE requires the sorting column to be sequential numbers. PARTITION BY clause is similar to GROUP BY - it lists the columns that define "identical" drive requests (in your case - name, from and to). Then you can simply use COUNT(*) to count number of days within such window.

select name, f, to, date, count(*) 
  over(partition by name, f, to
       order by day
       range between 1 preceding and 1 following) from (
select name, f, to, date, integer(timestamp(date)/1000000/60/60/24) day from
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date))

Purusha answered 27/4, 2015 at 16:25 Comment(0)

You could truncate the date to exclude the hours, minutes and seconds. Then group by that column

SELECT SUBSTR(STRING(date-of-request), 0, 9) AS day
FROM t1
GROUP BY day

Underhill answered 8/2, 2018 at 20:38 Comment(0)

Recommended topics

Hot tags