Cast JSON to HSTORE in Postgres 9.3+?
Asked Answered
P

2

6

I've read the docs and it appears that there's no discernible way to perform an ALTER TABLE ... ALTER COLUMN ... USING statement to directly convert a json type column to an hstore type. There's no function available (that I'm aware of) to perform the cast.

The next best alternative I have is to create a new column of type hstore, copy my JSON data to that new column using some external tool, drop the old json column and rename the new hstore column to the old column's name.

Is there a better way?

What I have so far is:

$ CREATE TABLE blah (unstructured_data JSON);

$ ALTER TABLE blah ALTER COLUMN unstructured_data 
       TYPE hstore USING CAST(unstructured_data AS hstore);
ERROR:   cannot cast type json to hstore
Protamine answered 4/2, 2015 at 7:39 Comment(2)
JSON can have multi dimensional arrays whereas hstore can not. This means that simply casting between the two isn't a straight forward procedure. There are plenty of user functions available so long as you are aware that the two data types are not always equal.Glary
I hear you--assuming my JSON is not nested, has only quoted property names and has only strings for properties, should I be able to write a casting function from JSON to HSTORE? Or would you recommend perform this "update" operation in a for loop outside Postgres?Protamine
C
12

Unfortunately, PostgreSQL doesn't allow all kind of expressions within the USING clause of ALTER TABLE ... SET DATA TYPE ... (f.ex. sub-queries are disallowed).

But, you can write a function to overcome this, you just need to decide what to do with advanced types (in object's values), like arrays & objects. Here is an example, which simply converts them to string:

CREATE OR REPLACE FUNCTION my_json_to_hstore(json)
  RETURNS hstore
  IMMUTABLE
  STRICT
  LANGUAGE sql
AS $func$
  SELECT hstore(array_agg(key), array_agg(value))
  FROM   json_each_text($1)
$func$;

After that, you can use this in your ALTER TABLE, like:

ALTER TABLE blah
  ALTER COLUMN unstructured_data
  SET DATA TYPE hstore USING my_json_to_hstore(unstructured_data);
Charisecharisma answered 4/2, 2015 at 10:41 Comment(5)
I get Error : ERROR: default for column "my_column" cannot be cast automatically to type hstore when running the ALTERTriviality
In JSON, the last specification of a key wins, while it appears that in hstore, the first specification of a key wins. This means the results can be inconsistent with what you would expect coming from a JSON. One way to deal with that is run it through a sort of no-op PL/v8 function, which clears out the extraneous key specifications. That's probably unnecessary with jsonb.Pianette
@Pianette or, you can just add a SELECT DISTINCT ON (key) & ORDER BY key to my solution's function (ASC|DESC depending on what value you want to preserve -- or even a window-function can filter them out)Charisecharisma
Just ordering by key wouldn't necessarily produce the same order as within the JSON, and the value doesn't give you the info to do that either. You could do something like SELECT key, $1->>key FROM (SELECT DISTINCT key FROM json_object_keys($1) json_key (key)) distinct_key, but this isn't really any better than a no-op PL/v8 function, imo.Pianette
@jpmc26: Excellent point. (Actually, the winner among duplicate keys is undefined for hstore.) I added an alternative version of pozs' smart function to cover the corner case.Californium
C
1

There is "trap" for repeated keys - allowed by both json and hstore input, but unfortunately resolved differently (!). Consider this example value:

json '{"double_key":"key1","foo":null,"double_key":"key2"}'

In json, 'double_key is effectively 'key2'. The manual:

Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.)

Bold emphasis mine.

In hstore, however, for the same order of key/value pairs, 'double_key' might effectively be 'key1'. The manual:

Each key in an hstore is unique. If you declare an hstore with duplicate keys, only one will be stored in the hstore and there is no guarantee as to which will be kept:

Typically, the first instance of a key, but that's an implementation details that might change.

A simple and fast option to always preserve the effective, operative value: cast to jsonb before the conversion. The manual again:

[...] jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

Modifying @pozs's conversion function:

CREATE OR REPLACE FUNCTION json2hstore(json)
  RETURNS hstore AS
$func$
  SELECT hstore(array_agg(key), array_agg(value))
  FROM   jsonb_each_text($1::jsonb)  -- !
$func$ LANGUAGE sql IMMUTABLE STRICT;

Requires Postgres 9.4 or later. Postgres 9.3 has the json type, but not jsonb, yet. A no-op in PL/v8 might be alternative there, like @jpmc mentioned.

Californium answered 8/2, 2019 at 17:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.