BigQuery materialized view - last in group
Asked Answered
P

3

8

In BigQuery Is it possible to create a materialized view containing the latest row for each group in a base table.

e.g.

CREATE TABLE basetable (
  group_id INT64, timestamp TIMESTAMP, value FLOAT64
);

INSERT INTO basetable (group_id, timestamp, value) VALUES
(1, '2020-01-01', 0.1), 
(1, '2020-01-02', 0.2),
(2, '2020-01-02', 0.1),
(2, '2020-01-01', 0.2);
Base table
+----------+--------------+-------+
| group_id | timestamp    | value |
+----------+--------------+-------+
|    1     | '2020-01-01' |   0.1 |
|    1     | '2020-01-02' |   0.2 |
|    2     | '2020-01-02' |   0.1 |
|    2     | '2020-01-01  |   0.2 |
+----------+--------------+-------+

I'd like materialized view to look as follows

Materialized view 
+----------+--------------+-------+
| group_id | timestamp    | value |
+----------+--------------+-------+
|    1     | '2020-01-02' |   0.2 |
|    2     | '2020-01-02' |   0.1 |
+----------+--------------+-------+

BigQuery materialized views do not support analytical functions or joins. Is there any other way to create such a view?

Perennial answered 22/7, 2020 at 7:26 Comment(0)
P
4

You can do that at most like this, be advised that the result is an Array with one item

CREATE MATERIALIZED VIEW  name as
SELECT group_id,
max(t.timestamp) as timestamp,
ARRAY_AGG(t.value  IGNORE NULLS ORDER BY t.timestamp DESC LIMIT 1) as value 
FROM table t
group by group_id

then you need a view as well

create view viewname as
SELECT group_id,timestamp
cast(value [safe_offset(0)] as string) as  value 
FROM materialized_view
Pregnable answered 27/7, 2020 at 10:0 Comment(4)
I thought that there wouldn't be difference, so I didn't include all the columns I have in the table. Unfortunately, there is difference. Can you think of a way to do it if there're multiple columns I need to get, not only value? I tried combining them with TO_JSON_STRING, but this yields an error Unsupported operator in materialized view: Struct.Perennial
@JuriKrainjukov at this stage of the Materialized View feature, this is the most you get. You write multiple lines if you need additional coluns, the same line as value but a separate line for each additional columnPregnable
Just wanted to say thanks for this, super helpful! For people using DBT, the column mapping can be taken care of using api.Relation.create and adapter.get_columns_in_relation to get all the columns and then rendering them in a for loop, excluding the column grouped by.Huth
in case values are a JSON value (since 2022-01), running the offset on the materialized view will incur in a full scan of the JSON instead of taking advantage of the JSON shredding. I didn't find a good solution working with JSONs yet unfortunatelyMoquette
E
2

You could use a single query with exists logic to ensure that the most recent record for each group_id is selected:

SELECT group_id, timestamp, value
FROM yourTable t1
WHERE NOT EXISTS (SELECT 1 FROM yourTable t2
                  WHERE t2.group_id = t1.group_id AND t2.timestamp > t1.timestamp);

I cannot think of any way to get the result set you want without using either analytic functions or a subquery of some kind. The above might be the leanest option here, without using analytic functions.

Evyn answered 22/7, 2020 at 7:31 Comment(2)
This query works by itself. Unfortunately, when I create materialize view it shows error Unsupported operator in materialized view: Limit.Perennial
Then you may be out of luck. I can't think of a way to write your view without either using analytic functions or a subquery on the same table.Evyn
P
1

I was trying to solve the same problem and I found that it is now possible to do this with a single query by using the new MAX_BY aggregation. The support for materialized views was added on Aug 8, 2023.

The final query would be:

CREATE MATERIALIZED VIEW test as
SELECT 
  group_id,

  MAX(timestamp) as timestamp,
  MAX_BY(value, timestamp) as value,
FROM table
GROUP BY 1
Prelude answered 25/9, 2023 at 18:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.