How to fill timestamp gaps in a Postgres query?
Asked Answered
U

4

7

Given the following table:

CREATE TABLE channel1m (
  ts TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  itemId BIGINT,
  value BIGINT
)

in which a row may be inserted each minute, per itemId, as follows:

ts                    itemId         value
2012-12-03 15:29:00   100            1
2012-12-03 15:30:00   100            2
2012-12-03 15:30:00   101            0
2012-12-03 15:32:00   100            1
2012-12-03 15:32:00   101            1

I can't find a way (without creating additional tables) to write a query that fills the time gaps (for example, 15:29:00 for itemId 101, and 15:31:00 for both items) by returning NULL in value.

The expected resultset would be:

ts                    itemId         value
2012-12-03 15:29:00   100            1
2012-12-03 15:29:00   101            NULL
2012-12-03 15:30:00   100            2
2012-12-03 15:30:00   101            0
2012-12-03 15:31:00   100            NULL
2012-12-03 15:31:00   101            NULL
2012-12-03 15:32:00   100            1
2012-12-03 15:32:00   101            1

I've found solutions having a separate time table with the full serie of timestamps, but I would much prefer to solve this in the query alone. Is this possible?

Unity answered 5/12, 2012 at 17:2 Comment(3)
LEFT-join with a calendar table, which can be generated by generate_series (min(ts), max(ts))Wersh
@wildplasser: You should make that an answer.Doordie
I was working on it ... Including (nested!) CTEsWersh
W
12
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path = tmp;

DROP TABLE IF EXISTS channel1m CASCADE;
CREATE TABLE channel1m (
  zts TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  zitemid BIGINT,
  zvalue BIGINT
);

-- in which a row may be inserted each minute, per zitemid, as follows:

INSERT INTO channel1m(zts, zitemid, zvalue) VALUES
 ('2012-12-03 15:29:00',   100,            1)
,('2012-12-03 15:30:00',   100,            2)
,('2012-12-03 15:30:00',   101,            0)
,('2012-12-03 15:32:00',   100,            1)
,('2012-12-03 15:32:00',   101,            1)
        ;

        -- CTE to the rescue!!!
WITH cal AS (
        WITH mm AS (
                SELECT MIN(xx.zts) AS minmin, MAX(xx.zts) AS maxmax
                 FROM channel1m xx)
        SELECT generate_series(mm.minmin , mm.maxmax , '1 min'::interval) AS stamp
        FROM mm
        )
, ite AS (
        SELECT DISTINCT zitemid AS zitemid
        FROM channel1m
        )
SELECT cal.stamp
        , ite.zitemid
        , tab.zvalue
FROM cal
JOIN ite ON 1=1 -- Note: this is a cartesian product of the {time,id} -domains
LEFT JOIN channel1m tab ON tab.zts = cal.stamp AND tab.zitemid = ite.zitemid
ORDER BY stamp ASC
        ;

Output:

NOTICE:  drop cascades to table tmp.channel1m
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE:  table "channel1m" does not exist, skipping
DROP TABLE
CREATE TABLE
INSERT 0 5
        stamp        | zitemid | zvalue 
---------------------+---------+--------
 2012-12-03 15:29:00 |     101 |       
 2012-12-03 15:29:00 |     100 |      1
 2012-12-03 15:30:00 |     100 |      2
 2012-12-03 15:30:00 |     101 |      0
 2012-12-03 15:31:00 |     100 |       
 2012-12-03 15:31:00 |     101 |       
 2012-12-03 15:32:00 |     100 |      1
 2012-12-03 15:32:00 |     101 |      1
(8 rows)
Wersh answered 5/12, 2012 at 17:22 Comment(1)
I was missing that cartesian product while trying to tackle a similar problem. Thanks!Hallmark
C
7

You will need: table with all itemId, and a (pseudo)table with all required dates.

You probably have the table with all distinct itemId. Lets call it item_table.

Pseudo-table with dates you can get with generate_series('start_date','end_date', interval '1 minute'). Details here.

The query:

SELECT gs.ts, it.itemId, ch1m.value
FROM item_table it
CROSS JOIN generate_series('start_date','end_date', interval '1 minute') gs(ts)
LEFT JOIN channel1m ch1m ON it.itemId = ch1m.itemId 
                         AND gs.ts = ch1m.ts

Replace 'start_date','end_date' with desired values or get them from sub query.

This query:

1) Builds all pairs of item-time via CROSS JOIN

2) Gets the value via LEFT JOIN

Coretta answered 5/12, 2012 at 17:20 Comment(0)
R
1

I think the most readable way is to build a series of table expressions. A cross join between minutes and item ID numbers will give you every combination.

with all_minutes as (
  select ('2012-12-03 15:29'::timestamp + 
           (n || ' minute')::interval)::timestamp as ts
  from generate_series(0,10) n
),
item_ids as (
  select distinct itemid from channel1m
),
all_items_and_minutes as (
  select all_minutes.ts, item_ids.itemid from all_minutes cross join item_ids
)
select all_items_and_minutes.ts, all_items_and_minutes.itemId, channel1m.value
from all_items_and_minutes 
left join channel1m 
       on all_items_and_minutes.ts = channel1m.ts
      and all_items_and_minutes.itemid = channel1m.itemid
order by all_items_and_minutes.ts, all_items_and_minutes.itemid

You can replace the timestamp literals with SELECT statements to get the actual range you need. If you have a different table that contains all the unique item id numbers, you might be better off selecting from that table instead of selecting distinct values from the channel1m table.

Redmund answered 5/12, 2012 at 17:28 Comment(0)
H
0
  1. Use time_bucket or date_trunc to create bucket boundaries.
  2. Use generate_series to generate empty buckets for your timeframe.
  3. Use UNION to merge empty time bucket dataset with your data.
  4. Use DISTINCT ON to pick unique rows per each time bucket, preferring those with data.

Example:

WITH
  timeseries_data AS (
    SELECT
      time_bucket(interval '5 minutes', started_at) time_bucket_start,
      count(distinct v1.value) unique_row_count
    FROM
      probe_execution pe1
    CROSS JOIN LATERAL (
      SELECT value
      FROM jsonb_array_elements(pe1.result)
    ) v1
    WHERE
      pe1.probe_id = 8 AND
      pe1.result_count > 0 AND
      pe1.started_at > now() - interval '1 day' AND
      pe1.ended_at < now()
    GROUP BY time_bucket_start
    UNION
    SELECT
      gs1 time_bucket_start,
      0 unique_row_count
    FROM
      generate_series(
        time_bucket(interval '5 minutes', now() - interval '1 day'),
        time_bucket(interval '5 minutes', now()),
        interval '1 minute'
      ) as gs1
  )
SELECT DISTINCT ON (td1.time_bucket_start)
  td1.time_bucket_start,
  td1.unique_row_count
FROM timeseries_data td1
ORDER BY td1.time_bucket_start, td1.unique_row_count DESC
Hannahhannan answered 3/3, 2019 at 18:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.