Error parsing JSON: more than one document in the input (Redshift to Snowflake SQL)
Asked Answered
N

2

6

I'm trying to convert a query from Redshift to Snowflake SQL.

The Redshift query looks like this:

SELECT 
                cr.creatives as creatives
                , JSON_ARRAY_LENGTH(cr.creatives) as creatives_length
                , JSON_EXTRACT_PATH_TEXT(JSON_EXTRACT_ARRAY_ELEMENT_TEXT (cr.creatives,0),'previewUrl') as preview_url
FROM campaign_revisions cr

The Snowflake query looks like this:

SELECT 
                cr.creatives as creatives
                , ARRAY_SIZE(TO_ARRAY(ARRAY_CONSTRUCT(cr.creatives))) as creatives_length
                , PARSE_JSON(PARSE_JSON(cr.creatives)[0]):previewUrl as preview_url
FROM campaign_revisions cr

It seems like JSON_EXTRACT_PATH_TEXT isn't converted correctly, as the Snowflake query results in error:

Error parsing JSON: more than one document in the input

cr.creatives is formatted like this:

"[{""previewUrl"":""https://someurl.com/preview1.png"",""device"":""desktop"",""splitId"":null,""splitType"":null},{""previewUrl"":""https://someurl.com/preview2.png"",""device"":""mobile"",""splitId"":null,""splitType"":null}]"
Nocturnal answered 15/10, 2021 at 12:47 Comment(3)
Check the JSON and how it got into Snowflake. The JSON is not valid. It should not start with double quotes. JSONs should normally start with left square bracket [ or left curly bracket { (rare exceptions are when the JSON consists of a single string, null, true, or false values). Also, there are 2x double quotes "" throughout the JSON, which also not valid JSON.Jauregui
you can also use check_json function to validate the json. Details: docs.snowflake.com/en/sql-reference/functions/check_json.htmlEmilie
Thank you - changing the double quotes "" to single quotes " in the JSON solved it!Nocturnal
C
2

It seems to me that you are not working with valid JSON data inside Snowflake.

Please review your file format used for the copy into command.

If you open the "JSON" text provided in a text editor , note that the information is not parsed or formatted as JSON because of the quoting you have. Once your issue with double quotes / escaped quotes is handled, you should be able to make good progress

Proper JSON on Left || Original Data on Right Invalid JSON vs Valid JSON

If you are not inclined to reload your data, see if you can create a Javascript User Defined Function to remove the quotes from your string, then you can use Snowflake to process the variant column.

The following code is working POJO that can be used to remove the doublequotes for you.

var textOriginal = '[{""previewUrl"":""https://someurl.com/preview1.png"",""device"":""desktop"",""splitId"":null,""splitType"":null},{""previewUrl"":""https://someurl.com/preview2.png"",""device"":""mobile"",""splitId"":null,""splitType"":null}]';


function parseText(input){
    var a = input.replaceAll('""','\"');
    a = JSON.parse(a);
    return  a;
}


x = parseText(textOriginal);
console.log(x);

See Output from JS Function Here

Cohlette answered 20/10, 2021 at 2:34 Comment(0)
P
0

For anyone else seeing this double double quote issue in JSON fields coming from CSV files in a Snowflake external stage (slightly different issue than the original question posted):

The issue is likely that you need to use the FIELD_OPTIONALLY_ENCLOSED_BY setting. Specifically, FIELD_OPTIONALLY_ENCLOSED_BY = '"' when setting up your fileformat. (docs)

Example of creating such a file format:

create or replace file format mydb.myschema.my_tsv_file_format
type = CSV
field_delimiter = '\t'
FIELD_OPTIONALLY_ENCLOSED_BY = '"';

And example of querying from a stage using this file format:

select
  $1 field_one
  $2 field_two
  -- ...and so on
from '@my_s3_stage/path/to/file/my_tab_separated_file.csv'  (file_format => 'my_tsv_file_format') 
Pluton answered 8/11, 2022 at 23:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.