How to prevent 'invalid input syntax for type json' in Postgres, when records contain a mix of json or strings
Asked Answered
I

6

40

I have a text column that contains JSON and also plan text. I want to convert it to JSON, and then select a particular property. For example:

user_data
_________
{"user": {"name": "jim"}}
{"user": {"name": "sally"}}
some random data string

I've tried:

select user_data::json#>'{user,name}' from users

I get:

ERROR:  invalid input syntax for type json
DETAIL:  Token "some" is invalid.
CONTEXT:  JSON user_data, line 1: some...

Is it possible to prevent this?

Inclinable answered 31/8, 2017 at 1:6 Comment(0)
T
19

If you want to skip the rows with invalid JSON, you must first test if the text is valid JSON. You can do this by creating a function which will attempt to parse the value, and catch the exception for invalid JSON values.

CREATE OR REPLACE FUNCTION is_json(input_text varchar) RETURNS boolean AS $$
  DECLARE
    maybe_json json;
  BEGIN
    BEGIN
      maybe_json := input_text;
    EXCEPTION WHEN others THEN
      RETURN FALSE;
    END;

    RETURN TRUE;
  END;
$$ LANGUAGE plpgsql IMMUTABLE;

When you have that, you could use the is_json function in a CASE or WHERE clause to narrow down the valid values.

-- this can eliminate invalid values
SELECT user_data::json #> '{user,name}'
FROM users WHERE is_json(user_data);

-- or this if you want to fill will NULLs
SELECT
  CASE
    WHEN is_json(user_data)
      THEN user_data::json #> '{user,name}'
    ELSE
      NULL
  END
FROM users;
Tracy answered 31/8, 2017 at 4:39 Comment(2)
this might be helpful as well to_json(value) https://mcmap.net/q/28794/-insert-text-with-single-quotes-in-postgresqlCrossfertilize
I still got errors with maybe_json := input_text; so I used PERFORM input_text::json #>> '{}'; instead and everything works.Lodie
C
13

Use this function:

create or replace function is_json(text)
returns boolean language plpgsql immutable as $$
begin
    perform $1::json;
    return true;
exception
    when invalid_text_representation then 
        return false;
end $$;

Test:

with users(user_data) as (
values
    ('{"user": {"name": "jim"}}'),
    ('not json'),
    ('{"user": {"name": "sally"}}'),
    ('also not json')
)

select user_data::json#>'{user,name}' as name
from users
where is_json(user_data);

  name   
---------
 "jim"
 "sally"
(2 rows)
Corrigendum answered 31/8, 2017 at 4:49 Comment(0)
A
3

Others have already suggested ways to check if the JSON is valid, but I feel that rather than check that, why not already cast it? I use this function (JSONB, but you can easily change it to JSON):

CREATE OR REPLACE FUNCTION safe_cast_to_jsonb(input TEXT) RETURNS JSONB AS
$$
DECLARE
    output JSONB DEFAULT NULL;
BEGIN
    BEGIN
        output := input::JSONB;
    EXCEPTION
        WHEN OTHERS THEN RAISE NOTICE 'INVALID JSONB';
        RETURN NULL;
    END;
    RETURN output;
END;
$$ LANGUAGE plpgsql;
Anticlerical answered 15/3, 2022 at 8:47 Comment(0)
A
0

My solution is to do a preselect for solving this issue.

Let's say we got a given table users with columns id and user_data. user_data is declared as a text field, it should contain {firstName, lastName} usually.

  • Note that Postgres JSON operations work fine, if a given field is NULL - so you could omit the WHERE in the following statement for still selecting the rows which contain an invalid JSON in user_data (while the result set would only contain the id field in such a row)
  • user_data ~ '^{.*}$' ("curly brace check") checks if JSON in contained in the user_data field. Danger! Bear in mind that this check is incomplete. Incorrect JSON such as '{"key1": "value1", "key2"}' is still recognised as valid
  • Consider impacts on database performance

Gives as an example:

WITH users_json_data AS (
  SELECT 
    id,
    CASE WHEN user_data ~ '^{.*}$' THEN user_data::json END AS user_json
  FROM users
)
SELECT id, users_json_data.user_json->>firstName, users_json_data.user_json->>lastName
FROM users
INNER JOIN users_json_data USING (id)
WHERE users_json_data.user_json IS NOT NULL
Adequacy answered 7/11, 2023 at 9:24 Comment(0)
S
0

Here's another solution which has a better performance:

CREATE OR REPLACE FUNCTION is_json(inp text)
    RETURNS boolean LANGUAGE 'plperl' IMMUTABLE
AS $BODY$
# use JSON; # you need to preload this or use plperlu
my ($in) = @_;
my $chr  = substr($in, 0, 1);

return 'f' if $chr ne '[' && $chr ne '{';

eval {
    decode_json($in);
};

return $@ ? 'f' : 't';
$BODY$;

But of course you need to be familiar with plplerl to use this version.

If I run the query:

explain analyze select is_json(val) from mytable

this version has an execution time of about 7 ms as opposed to klin's plpgsql function which runs for about 12ms. The table had 3542 rows at the moment of testing. Plpgsql's exception handling blocks are costly to use, so avoid them when you can.

Squat answered 2/2 at 15:42 Comment(0)
S
0

For me, it worked when I filtered the rows.

SELECT id, column_name
FROM schema.table_name 
WHERE column_name IS NOT NULL  
AND (column_name !~ '^\s*\{.*\}\s*$' OR column_name !~ '^\s*\[.*\]\s*$');

Then I got the rows where the data is invalid. I deleted those rows.

delete  from schema.table_name   where id ='553';

Then I converted the column type to jsonb from varchar.

EDIT: My use-case was converting a varchar column to json type. There was some data which was preventing the conversion (alter table statement)

Stocktonontees answered 3/6 at 6:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.