Unnesting in SQL (Athena): How to convert array of structs into an array of values plucked from the structs?
Asked Answered
A

2

16

I am taking samples from a Bayesian statistical model, serializing them with Avro, uploading them to S3, and querying them with Athena.

I need help writing a query that unnests an array in the table.

The CREATE TABLE query looks like:

CREATE EXTERNAL TABLE `model_posterior`(
  `job_id` bigint,
  `model_id` bigint,
  `parents` array<struct<`feature_name`:string,`feature_value`:bigint, `is_zid`:boolean>>,
  `posterior_samples` struct <`parameter`:string,`is_scaled`:boolean,`samples`:array<double>>)

The "samples" array in the "posterior_samples" column is where the samples are stored. I have managed to unnest the "posterior_samples" struct with the following query:

WITH samples AS (
    SELECT model_id, parents, sample, sample_index
    FROM posterior_db.model_posterior 
    CROSS JOIN UNNEST(posterior_samples.samples) WITH ORDINALITY AS t (sample, sample_index)
    WHERE job_id = 111000020709
)
SELECT * FROM samples

enter image description here

Now what I want is to unnest the parents column. Each record in this column is an array of structs. I am trying to create a column that just has an array of values for the "feature_value" keys in that array of structs. (The reason why I want an array is that the parents array can have a length > 1).

In other words for each array in the parents row, I want an array of the same size. That array should contain only the values of the "feature_value" key from the structs in the original array.

Any advice on how to solve this?

Thanks.

Alumina answered 27/1, 2018 at 2:51 Comment(0)
C
9

You can use transform function described here. Assuming we have table named samples with structure mentioned in your question. Then you can write query that looks something like as follows

SELECT *, transform(parents, parent -> parent.feature_value) as only_ feature_values

FROM samples

Note : this many not be the perfect query syntactically but you can play with it.

Hope this would help. Cheers :)

Collaborate answered 31/1, 2018 at 10:24 Comment(0)
A
0

To pluck values out of an array, you'd need to UNNEST the array, which expands the array of structs into separate rows, and then CROSS JOIN such that you get one row per struct inside the array.

For a table with two sets of nested arrays of structs, such as

CREATE EXTERNAL TABLE IF NOT EXISTS `my_database`.`product_category` (
  id int,
  products array < struct <
    id: string,
    variants: array < struct <
        id: string
    >>
   >>
)

If you wanted to get one row per product in each category AND one row per variant within each product, you would do something like this:

SELECT
  id,
  product.id AS product_id,
  variant.id AS variant_id
FROM
  "my_database"."product_category"
CROSS JOIN UNNEST(products) AS t(product)
CROSS JOIN UNNEST(product.variants) AS t2(variant)
Arbour answered 17/4 at 14:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.