I am taking samples from a Bayesian statistical model, serializing them with Avro, uploading them to S3, and querying them with Athena.
I need help writing a query that unnests an array in the table.
The CREATE TABLE query looks like:
CREATE EXTERNAL TABLE `model_posterior`(
`job_id` bigint,
`model_id` bigint,
`parents` array<struct<`feature_name`:string,`feature_value`:bigint, `is_zid`:boolean>>,
`posterior_samples` struct <`parameter`:string,`is_scaled`:boolean,`samples`:array<double>>)
The "samples" array in the "posterior_samples" column is where the samples are stored. I have managed to unnest the "posterior_samples" struct with the following query:
WITH samples AS (
SELECT model_id, parents, sample, sample_index
FROM posterior_db.model_posterior
CROSS JOIN UNNEST(posterior_samples.samples) WITH ORDINALITY AS t (sample, sample_index)
WHERE job_id = 111000020709
)
SELECT * FROM samples
Now what I want is to unnest the parents column. Each record in this column is an array of structs. I am trying to create a column that just has an array of values for the "feature_value" keys in that array of structs. (The reason why I want an array is that the parents array can have a length > 1).
In other words for each array in the parents row, I want an array of the same size. That array should contain only the values of the "feature_value" key from the structs in the original array.
Any advice on how to solve this?
Thanks.