How to convert varchar to array in Presto Athena
Asked Answered
D

1

10

My data is in VARCHAR format. I want to split both the elements of this array so that I can then extract a key value from the JSON.

Data format

[
  {
    "skuId": "5bc87ae20d298a283c297ca1",
    "unitPrice": 0,
    "id": "5bc87ae20d298a283c297ca1",
    "quantity": "1"
  },
  
{
    "skuId": "182784738484wefhdchs4848",
    "unitPrice": 50,
    "id": "5bc87ae20d298a283c297ca1",
    "quantity": "4"
  },
]

For example I want to extract skuid from the above column. So my data after extraction should look like:

1 5bc87ae20d298a283c297ca1
2 182784738484wefhdchs4848

Cast to array doesn't work either:

SELECT CAST(col AS ARRAY)

gives the following error:

Unknown type: array

So I am not able to un-nest the array.

How do I do solve this problem in Presto Athena?

Dennett answered 16/5, 2019 at 11:38 Comment(1)
were. you able to get a solution to extract the data on Athena? I just tried the below solution and ended up in an error: INVALID_CAST_ARGUMENT: Cannot cast JSON to array(row(skuid varchar))Protagonist
I
11

You can use a combination of parsing the value as JSON, casting it to a structured SQL type (array/map/row), and UNNEST WITH ORDINALITY to extract the elements from the array as separate rows. Note that this only works if the array elements in the JSON payload don't have a trailing commas. Your example has one but it is removed from the example below.

WITH data(value) AS (VALUES
 '[
    {
      "skuId": "5bc87ae20d298a283c297ca1",
      "unitPrice": 0,
      "id": "5bc87ae20d298a283c297ca1",
      "quantity": "1"
    },
    {
      "skuId": "182784738484wefhdchs4848",
      "unitPrice": 50,
      "id": "5bc87ae20d298a283c297ca1",
      "quantity": "4"
    }
  ]'
),
parsed(entries) AS (
  SELECT cast(json_parse(value) AS array(row(skuId varchar)))
  FROM data
)
SELECT ordinal, skuId
FROM parsed, UNNEST(entries) WITH ORDINALITY t(skuId, ordinal)

produces:

 ordinal |          skuId
---------+--------------------------
       1 | 5bc87ae20d298a283c297ca1
       2 | 182784738484wefhdchs4848
(2 rows)
Impatient answered 16/5, 2019 at 20:32 Comment(2)
Martin: I have tried executing snippet provided by you on aws athena. I get an error : INVALID_CAST_ARGUMENT: Cannot cast JSON to array(row(skuid varchar))Protagonist
AWS Athena has two versions, which map to specific versions of Presto. Athena version 1 used Presto 0.172; Athena version 2 uses version 0.217. See here for details. Please check what version of Athena you're using and whether the corresponding version of Presto supports these sorts of operations.Correspond

© 2022 - 2024 — McMap. All rights reserved.