Unnesting structs in BigQuery
Asked Answered
S

3

15

What is the correct way to flatten a struct of two arrays in BigQuery? I have a dataset like the one pictured here (the struct.destination and struct.visitors arrays are ordered - i.e. the visitor counts correspond specifically to the destinations in the same row):

enter image description here

I want to reorganize the data so that I have a total visitor count for each unique combination of origins and destinations. Ideally, the end result will look like this:

enter image description here

I tried using UNNEST twice in a row - once on struct.destination and then on struct.visitors, but this produces the wrong result (each destination gets mapped to every value in the array of visitor counts when it should only get mapped to the value in the same row):

SELECT
  origin,
  unnested_destination,
  unnested_visitors
FROM
  dataset.table,
  UNNEST(struct.destination) AS unnested_destination,
  UNNEST(struct.visitors) AS unnested_visitors
Sarina answered 18/11, 2019 at 18:29 Comment(0)
H
16

You have one struct that is repeated. So, I think you want:

SELECT origin,
       s.destination,
       s.visitors
FROM dataset.table t CROSS JOIN
     UNNEST(t.struct) s;

EDIT:

I see, you have a struct of two arrays. You can do:

SELECT origin, d.destination, v.visitors
FROM dataset.table t CROSS JOIN
     UNNEST(struct.destination) s WITH OFFSET nd LEFT JOIN
     UNNEST(struct.visitors) v WITH OFFSET nv
     ON nd = nv
Hudson answered 18/11, 2019 at 18:31 Comment(2)
Thanks for the super fast response Gordon. I tried your suggestion but BigQuery is giving me an error: "Values referenced in UNNEST must be arrays. UNNEST contains expression of type STRUCT<destination ARRAY<STRING>, visitors ARRAY<STRING>>"Sarina
Was s WITH offset supposed to be d WITH offset?Notice
B
0

Difficult to test by not having the underlying data to test on, so I created my own query with your dataset. As far as I can tell destination|visitors is not in an ARRAY-format, but rather in a STRUCT-format, so you do not need UNNEST it. Also view this thread please :)

SELECT
  origin,
  COUNT(struct.destination),
  COUNT(struct.visitors)
FROM dataset.table
GROUP BY 1
Bighead answered 31/3, 2021 at 12:14 Comment(0)
B
0

I found that this construction aligns struct and row data correctly:

SELECT 
   T0.origin AS origin,
   S.destination AS destination,
   S.visitors AS total_visitors
FROM dataset.table T0,
UNNEST(struct) AS S
Barbarize answered 5/7, 2023 at 15:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.