I have a table with millions of records in which precise group every
10 minutes
tl;dr: for the impatient ones, see the last query in the answer, which is the real solution, the others are step by step on how to get there. Also, all queries + schemas are available at SQLFiddle, for those who want to play with.
The best solution for such problem, in my opinion, is to truncate each timestamp into the start of its 10 minutes, for instance, let's try to make the following conversion (original -> 10 minutes truncated
):
13:10 -> 13:10
13:15 -> 13:10
13:18 -> 13:10
13:20 -> 13:20
...
If any one want to try the following queries, you can create the schema as:
CREATE TABLE your_table(tscol timestamptz);
INSERT INTO your_table VALUES
('2011/01/03 19:18:00.300'),
('2011/01/03 19:18:00.503'),
('2011/01/03 19:20:26.335'),
('2011/01/03 19:29:54.289'),
('2011/01/04 14:43:43.067'),
('2011/01/04 14:50:10.727'),
('2011/01/04 14:52:26.827'),
('2011/01/04 14:57:55.608'),
('2011/01/04 14:57:55.718'),
('2011/01/04 14:59:13.603'),
('2011/01/04 15:00:34.260'),
('2011/01/04 15:02:55.687'),
('2011/01/04 15:07:15.378');
So, in order to do that, we need to understand date_trunc
and date_part
functions (the latter can be invoked by the standard EXTRACT
) and interval
data type. Let's build the solution step by step, the final idea is to have something like this (now a pseudo-code):
SELECT truncate_the_time_by_10_minutes(tscol) AS trunc10, count(*)
FROM your_table
GROUP BY trunc10
ORDER BY trunc10;
Now, if the problem was "aggregate by minute", then we could simple truncate the timestamp to the minute, which simple means zeroing seconds and microsseconds, which is exactly what date_trunc('minute', ...)
does, so:
SELECT date_trunc('minute', tscol) AS trunc_minute, count(*)
FROM your_table
GROUP BY trunc_minute
ORDER BY trunc_minute;
Works, but it is not what you want, the next capability of date_trun
is with 'hour'
, which would already loose the information we need, so we need something between 'minute'
and 'hour'
. Let's see how the above query works with some examples:
SELECT tscol, date_trunc('minute', tscol) AS trunc_minute
FROM your_table
ORDER BY tscol;
Which returns:
tscol | trunc_minute
----------------------------+------------------------
2011-01-03 19:18:00.3-02 | 2011-01-03 19:18:00-02
2011-01-03 19:18:00.503-02 | 2011-01-03 19:18:00-02
2011-01-03 19:20:26.335-02 | 2011-01-03 19:20:00-02
2011-01-03 19:29:54.289-02 | 2011-01-03 19:29:00-02
...
If you see 2011-01-03 19:18:00-02
, now we just need to subtract 8 minutes, to do that we can:
EXTRACT(MINUTE FROM tscol)
will return 18
- As we want to truncate by 10 minutes, let's take the modulo of
18 and 10
, so 18 % 10
which give us 8
- Now, we have the
8
minutes that we want to subtract, but as an integer, and to subtract from timestamp[tz]
we need an interval
, as the integer is representing minute, we can do: 8 * interval '1 minute'
, which will give us 00:08:00
Getting the 3 steps above in the last query, we have (I'll show each column to better understanding):
SELECT
tscol,
date_trunc('minute', tscol) AS trunc_minute,
CAST(EXTRACT(MINUTE FROM tscol) AS integer) % 10 AS min_to_subtract,
(CAST(EXTRACT(MINUTE FROM tscol) AS integer) % 10) * interval '1 minute' AS interval_to_subtract,
date_trunc('minute', tscol) - (CAST(EXTRACT(MINUTE FROM tscol) AS integer) % 10) * interval '1 minute' AS solution
FROM your_table
ORDER BY tscol;
Which returns:
tscol | trunc_minute | min_to_subtract | interval_to_subtract | solution
----------------------------+------------------------+-----------------+----------------------+------------------------
2011-01-03 19:18:00.3-02 | 2011-01-03 19:18:00-02 | 8 | 00:08:00 | 2011-01-03 19:10:00-02
2011-01-03 19:18:00.503-02 | 2011-01-03 19:18:00-02 | 8 | 00:08:00 | 2011-01-03 19:10:00-02
2011-01-03 19:20:26.335-02 | 2011-01-03 19:20:00-02 | 0 | 00:00:00 | 2011-01-03 19:20:00-02
2011-01-03 19:29:54.289-02 | 2011-01-03 19:29:00-02 | 9 | 00:09:00 | 2011-01-03 19:20:00-02
...
Now, the last column is the solution we want, the timestamp truncated to its 10 minutes group, now we can simple aggregate and have our final solution:
SELECT
date_trunc('minute', tscol) - (CAST(EXTRACT(MINUTE FROM tscol) AS integer) % 10) * interval '1 minute' AS trunc_10_minute,
count(*)
FROM your_table
GROUP BY trunc_10_minute
ORDER BY trunc_10_minute;
Which returns:
trunc_10_minute | count
------------------------+-------
2011-01-03 19:10:00-02 | 2
2011-01-03 19:20:00-02 | 2
2011-01-04 14:40:00-02 | 1
2011-01-04 14:50:00-02 | 5
2011-01-04 15:00:00-02 | 5
(5 rows)
That is the exactly output you gave, but I believe it is what you actually expect, if not it is just a matter of small adjustment.