Flatten data source in Snowflake from Array
Asked Answered
N

1

6

I am trying to fix an array in a dataset. Currently, I have a data set that has a reference number to multiple different uuids. What I would like to do is flatten this out in Snowflake to make it so the reference number has separate row for each uuid. For example

Reference                                       UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75 "[
                                         ""05554f65-6aa9-4dd1-6271-8ce2d60f10c4"",
                                         ""df662812-7f97-0b43-9d3e-12f64f504fbb"",
                                          ""08644a69-76ed-ce2d-afff-b236a22efa69"",
                                          ""f1162c2e-eeb5-83f6-5307-2ed644e6b9eb"",
                                            ]"

Should end up looking like:

Reference                                UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75    05554f65-6aa9-4dd1-6271-8ce2d60f10c4
2) 9f823c2a-ced5-4dbe-be65-869311462f75    df662812-7f97-0b43-9d3e-12f64f504fbb
3) 9f823c2a-ced5-4dbe-be65-869311462f75    08644a69-76ed-ce2d-afff-b236a22efa69
4) 9f823c2a-ced5-4dbe-be65-869311462f75    f1162c2e-eeb5-83f6-5307-2ed644e6b9eb

I just started working in Snowflake so I am new to it. It looks like there is a lateral flatten, but this is either not working on telling me that I have all sorts of errors with it. The documentation from snowflake is a bit perplexing when it comes to this.

Nichollenicholls answered 24/1, 2020 at 0:49 Comment(2)
Yes you will need to pass the array into the Flatten table function to explode these values out. Are you able to post a source file snippet (JSON?) or are those entries you have listed from a table already in Snowflake? Understanding this would help with the syntax required.Lin
As Mike pointed out, it's typical to store lists like these in a variant column in JSON. If the UUID column is actually a string, I tested code that will parse and do a lateral join on the parsed string. Let me know if you want to see an example if the UUID column is not a variant type.Ruffle
G
11

While FLATTEN is the right approach when exploding an array, the UUID column value shown in the original description is invalid if interpreted as JSON syntax: "[""val1"", ""val2""]" and that'll need correction before a LATERAL FLATTEN approach can be applied by treating it as a VARIANT type.

If your data sample in the original description is a literal one and applies for all columnar values, then the following query will help transform it into a valid JSON syntax and then apply a lateral flatten to yield the desired result:

SELECT
  T.REFERENCE,
  X.VALUE AS UUID
FROM (
  SELECT
    REFERENCE,
    -- Attempts to transform an invalid JSON array syntax such as "[""a"", ""b""]"
    -- to valid JSON: ["a", "b"] by stripping away unnecessary quotes
    PARSE_JSON(REPLACE(REPLACE(REPLACE(UUID, '""', '"'), '["', '['), ']"', ']')) AS UUID_ARR_CLEANED
    FROM TABLENAME) T,
  LATERAL FLATTEN(T.UUID_ARR_CLEANED) X

If your data is already in a valid VARIANT type with a successful PARSE_JSON done for the UUID column during ingest, and the example provided in the description was just a formatting issue that only displays the JSON invalid in the post, then the simpler version of the same query as above will suffice:

SELECT REFERENCE, X.VALUE AS UUID
FROM TABLENAME, LATERAL FLATTEN(TABLENAME.UUID) X
Gauhati answered 24/1, 2020 at 6:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.