Querying JSON fields in Redshift
Asked Answered
P

4

14

We plan to start using Redshift soon, and one of our fields (columns) is a a JSON value. It's a flat JSON (i.e. by definition no nested levels) and the reason we must use JSON is that each record has different number of different key-value elements, which may very from 0 to 10 or more (so I can't use a field per pair or such).

For example, such field may be {"key1":"value1", "key2":"value2", ..., "key5":"value5"}

I would like to query and count all records having some specific key, and possibly group them by value. In the example above I would like something like "select count(*) where field has key 'key1' group by its value".

Does Redshift support querying by values within the JSON? How can such be achieved?

Phillisphilly answered 28/10, 2014 at 15:59 Comment(0)
A
15

Did you try using Redshift's JSON_EXTRACT_PATH_TEXT function?

Avert answered 28/10, 2014 at 16:40 Comment(1)
Saw that, wasn't sure if it can answer my requirements and whether I can use it in a where clause (for technical reasons I don't yet have hands-on a RS DB).Phillisphilly
C
29

Yes, Amazon Redshift supports parsing JSON string within a column with "JSON_EXTRACT_PATH_TEXT" function, and you can call this function even in where clause or group by clause. It's better to see the following example to understand how it works.

db=> create table json_test (id int primary key, json text);
db=> insert into json_test values (1, '{"key1":1, "key2":"a"}');
db=> insert into json_test values (2, '{"key1":2, "key2":"b"}');
db=> insert into json_test values (3, '{"key1":3, "key2":"a"}');
db=> insert into json_test values (4, '{"key3":0}');
db=> select * from json_test order by id;
 id |          json
----+------------------------
  1 | {"key1":1, "key2":"a"}
  2 | {"key1":2, "key2":"b"}
  3 | {"key1":3, "key2":"a"}
  4 | {"key3":0}
(4 rows)


-- In select list
db=> select json_extract_path_text(json, 'key2') as key2 from json_test where id = 1;
 key2
------
 a
(1 row)


-- Where clause
db=> select * from json_test where json_extract_path_text(json, 'key1') = 1;
 id |          json
----+------------------------
  1 | {"key1":1, "key2":"a"}
(1 row)


-- Group by
db=> select min(id) as min_id from json_test group by json_extract_path_text(json, 'key2') order by min_id;
 min_id
--------
      1
      2
      4
(3 rows)

See Redshift Dev Guide - JSON_EXTRACT_PATH_TEXT Function for the details of "JSON_EXTRACT_PATH_TEXT" function. Also you can see other JSON functions in Redshift Dev Guide - JSON Functions.

Celt answered 29/10, 2014 at 20:38 Comment(2)
How fast is that function to use? Does it have major speed implications compared to splitting the json into its own columns, and just adding new columns for new keys?Monitor
May have speed implications due to the following ... "We recommend using JSON sparingly. JSON is not a good choice for storing larger datasets because, by storing disparate data in a single column, JSON does not leverage Amazon Redshift’s column store architecture."Redshift Dev Guide - JSON FunctionsMercurial
A
15

Did you try using Redshift's JSON_EXTRACT_PATH_TEXT function?

Avert answered 28/10, 2014 at 16:40 Comment(1)
Saw that, wasn't sure if it can answer my requirements and whether I can use it in a where clause (for technical reasons I don't yet have hands-on a RS DB).Phillisphilly
S
2

Since April 2021, Amazon Redshift provides native support for JSON using SUPER data type. It provides advanced features like dynamic typing and objects unpivoting (see AWS doc). Using SUPER data type make it much more easier to work with JSON data:

  1. First, convert your JSON column into SUPER data type using JSON_PARSE() function.
  2. Then use PartiQL to navigate into your JSON document (see AWS doc)

Example:

CREATE TABLE bar AS SELECT '{"key1":1, "key2":[10,11,12]}' AS data;

SELECT json_parse(data).key2[0] as v FROM bar;

-- returns 1 row with v=10

Side note: it is often a good idea to use a CTE to apply JSON_PARSE().

Soble answered 16/3, 2022 at 23:31 Comment(0)
A
0

As noted in the other answers, JSON_EXTRACT_PATH_TEXT is the way to parse json so you can query on json properties. However, it there are major implications for speed. If you need to filter on something in the json blob, then you should extract that and store as a normal field on the record.

Abisia answered 13/2, 2019 at 18:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.