bigquery group by all columns except a few
Asked Answered
P

2

10

I have a table with loads of fields, and I am trying to group by all except two values which I am summing on. I would like to do something like

SELECT my_table.* except(value_1, value_2)
    , sum(value_1)
    , sum(value_2)
FROM my_table
GROUP BY my_table.* except(value_1, value_2)

But unfortunately GROUP BY my_table.* except(value_1, value_2) do not work. Any suggestions please?

Pitchblack answered 20/2, 2019 at 17:42 Comment(0)
P
12

Below is for BigQuery Standard SQL

#standardSQL
SELECT DISTINCT * EXCEPT(value_1, value_2, grp),
  SUM(value_1) OVER(PARTITION BY grp) sum_value_1,
  SUM(value_2) OVER(PARTITION BY grp) sum_value_2
FROM (
  SELECT *, REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grp
  FROM `project.dataset.table` t
)

You can test, play with above using dummy data as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 value_1, 2 value_2, 3 value_3, 4 value_4 UNION ALL
  SELECT 11, 12, 3, 14 UNION ALL
  SELECT 21, 22, 3, 14
)
SELECT DISTINCT * EXCEPT(value_1, value_2, grp),
  SUM(value_1) OVER(PARTITION BY grp) sum_value_1,
  SUM(value_2) OVER(PARTITION BY grp) sum_value_2
FROM (
  SELECT *, REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grp
  FROM `project.dataset.table` t
)

with result as

Row value_3 value_4 sum_value_1 sum_value_2  
1   3       14      32          34   
2   3       4       1           2    

Above will work with any number of columns and you don't need to reference them all explicitly - only those columns to be excluded to be explicitly referenced - value_1 and value_2 in this example

Pascoe answered 20/2, 2019 at 18:5 Comment(4)
This is fantastic! Thanks a lot! Can you explain a little what this part is doing? REGEXP_REPLACE(TO_JSON_STRING(t), r'"(?:value_1|value_2)":.+?[,}]', '') grpDobby
@Dobby - it convert whole row into json, then empties all values for value_1 and value_2 attributes so then you can use it as attribute for PARTITION BYPascoe
Are we sure that the order is always the same when serialising rows into JSONs? To me this looks risky since JSON is unordered by definition.Nomenclator
@Nomenclator TO_JSON_STRING returns a string in JSON format and does not 'create a JSON object' in a formal sense: cloud.google.com/bigquery/docs/reference/standard-sql/…Ephialtes
S
0

BigQuery has since released the GROUP BY ALL clause (currently in preview):
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group_by_all

This allows you to rewrite your original suggestion like so:

SELECT
    my_table.* EXCEPT(value_1, value_2),
    SUM(value_1),
    SUM(value_2),
FROM my_table
GROUP BY ALL

(BigQuery has also supported trailing commas for years now so I rewrote your SELECT slightly)

The clause automatically excludes expressions that include aggregate functions so this works perfectly for your use case.

Suttle answered 6/6, 2024 at 14:31 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.