Postgresql COPY empty string as NULL not work
Asked Answered
P

3

10

I have a CSV file with some integer column, now it 's saved as "" (empty string).

I want to COPY them to a table as NULL value.

With JAVA code, I have try these:

String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',',  HEADER true)";
String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', NULL ''  HEADER true)";

I get: PSQLException: ERROR: invalid input syntax for type numeric: ""

String sql = "COPY " + tableName + " FROM STDIN (FORMAT csv,DELIMITER ',', NULL '\"\"'  HEADER true)";

I get: PSQLException: ERROR: CSV quote character must not appear in the NULL specification

Any one has done this before ?

Postexilian answered 26/8, 2017 at 6:4 Comment(2)
Can you try to remove the blackslashes, so you end up with NULL '""'?Metatarsal
Please show some actual lines from your CSV file. Is NULL represented by an actual empty string (i.e. nothing) or by two double-quotes? And always your version of Postgres.Pliam
P
16

I assume you are aware that numeric data types have no concept of "empty string" ('') . It's either a number or NULL (or 'NaN' for numeric - but not for integer et al.)

Looks like you exported from a string data type like text and had some actual empty string in there - which are now represented as "" - " being the default QUOTE character in CSV format.

NULL would be represented by nothing, not even quotes. The manual:

NULL

Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.

You cannot define "" to generally represent NULL since that already represents an empty string. Would be ambiguous.

To fix, I see two options:

  1. Edit the CSV file / stream before feeding to COPY and replace "" with nothing. Might be tricky if you have actual empty string in there as well - or "" escaping literal " inside strings.

  2. (What I would do.) Import to an auxiliary temporary table with identical structure except for the integer column converted to text. Then INSERT (or UPSERT?) to the target table from there, converting the integer value properly on the fly:

-- empty temp table with identical structure
CREATE TEMP TABLE tbl_tmp AS TABLE tbl LIMIT 0;

-- ... except for the int / text column
ALTER TABLE tbl_tmp ALTER col_int TYPE text;

COPY tbl_tmp ...;

INSERT INTO tbl  -- identical number and names of columns guaranteed
SELECT col1, col2, NULLIF(col_int, '')::int  -- list all columns in order here
FROM   tbl_tmp;

Temporary tables are dropped at the end of the session automatically. If you run this multiple times in the same session, either just truncate the existing temp table or drop it after each transaction.

Related:

Pliam answered 26/8, 2017 at 16:16 Comment(1)
at first i didn't trust option 2. Then, i saw it was ErwinVerda
Q
17

Since Postgres 9.4 you now have the ability to use FORCE_NULL. This causes the empty string to be converted into a NULL. Very handy, especially with CSV files (actually this is only allowed when using CSV format).

The syntax is as follow:

COPY table FROM '/path/to/file.csv' 
WITH (FORMAT CSV, DELIMITER ';', FORCE_NULL (columnname));

Further details are explained in the documentation: https://www.postgresql.org/docs/current/sql-copy.html

Querist answered 31/5, 2019 at 10:25 Comment(4)
Option FORCE_NULL is with underscore and should be specified inside the "WITH (...)" clause. For example: COPY table FROM '/path/to/file.csv' WITH (FORMAT CSV, DELIMITER ';', FORCE_NULL (field1, field2, field3));Khichabia
@Khichabia You're right, indeed this is the current preferred syntax. The syntax I've used is still supported, even in version 13, but nonetheless it makes more sense to use the 'standard syntax'. I've corrected my example, thanks!Querist
is there a way to tell force_null ALL fields, without listing them out? Something like FORCE_NULL (*)?Verda
@ZacharyRyanSmith It appears that this is only possible for FORCE_QUOTE, but you can of course try to see if it works.Querist
P
16

I assume you are aware that numeric data types have no concept of "empty string" ('') . It's either a number or NULL (or 'NaN' for numeric - but not for integer et al.)

Looks like you exported from a string data type like text and had some actual empty string in there - which are now represented as "" - " being the default QUOTE character in CSV format.

NULL would be represented by nothing, not even quotes. The manual:

NULL

Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.

You cannot define "" to generally represent NULL since that already represents an empty string. Would be ambiguous.

To fix, I see two options:

  1. Edit the CSV file / stream before feeding to COPY and replace "" with nothing. Might be tricky if you have actual empty string in there as well - or "" escaping literal " inside strings.

  2. (What I would do.) Import to an auxiliary temporary table with identical structure except for the integer column converted to text. Then INSERT (or UPSERT?) to the target table from there, converting the integer value properly on the fly:

-- empty temp table with identical structure
CREATE TEMP TABLE tbl_tmp AS TABLE tbl LIMIT 0;

-- ... except for the int / text column
ALTER TABLE tbl_tmp ALTER col_int TYPE text;

COPY tbl_tmp ...;

INSERT INTO tbl  -- identical number and names of columns guaranteed
SELECT col1, col2, NULLIF(col_int, '')::int  -- list all columns in order here
FROM   tbl_tmp;

Temporary tables are dropped at the end of the session automatically. If you run this multiple times in the same session, either just truncate the existing temp table or drop it after each transaction.

Related:

Pliam answered 26/8, 2017 at 16:16 Comment(1)
at first i didn't trust option 2. Then, i saw it was ErwinVerda
J
0

If we want to replace all blank and empty rows with null then you just have to add emptyasnull blanksasnull in copy command

syntax :

    copy Table_name (columns_list)
    from 's3://{bucket}/{s3_bucket_directory_name + manifest_filename}'
    iam_role '{REDSHIFT_COPY_COMMAND_ROLE}' emptyasnull blanksasnull 
    manifest DELIMITER ',' IGNOREHEADER 1 compupdate off csv gzip;

Note: It will apply for all the records which contains empty/blank values

Janicejanicki answered 17/2, 2023 at 6:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.