"ERROR: extra data after last expected column" when using PostgreSQL COPY
Asked Answered
S

2

7

Please bear with me as this is my first post.

I'm trying to run the COPY command in PostgreSQL-9.2 to add a tab delimited table from a .txt file to a PostgreSQL database such as:

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER ' ');

I've already created an empty table called "raw_data" in the database using the SQL command:

CREATE TABLE raw_data ();

I keep getting the following error message when trying to run the COPY command:

ERROR:  extra data after last expected column
CONTEXT:  COPY raw_data, line 1: "  1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  ..."

(The numbers here are supposed to be the column headings)

I'm not sure if its because I didn't specify table columns when creating the db table but I'm trying to avoid having to manually enter in 800 or columns.

Any suggestions on how to fix this?

Here's an example of what the .txt file looks like:

        1   2   3   4   5   6   7   8   9
binary1 1   1   0   1   1   1   1   1   1
binary2 1   0   0   1   0   1   1   0   0
binary3 1   0   1   1   1   0   0   1   0
binary4 1   1   1   1   0   1   0   1   0
Slapdash answered 3/5, 2013 at 20:56 Comment(1)
+1, btw. Welcome to Stackoverflow, very nice question for a first post.Yea
Y
12

An empty table won't do. You need table that matches the structure of input data. Something like:

CREATE TABLE raw_data (
  col1 int
, col2 int
  ...
);

You don't need to declare tab as DELIMITER since that's the default:

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt';

800 columns you say? That many columns would typically indicate a problem with your design. Anyway, there are ways to half-automate the CREATE TABLE script.

Automation

Assuming simplified raw data

1   2   3   4  -- first row contains "column names"
1   1   0   1  -- tab separated
1   0   0   1
1   0   1   1

Define a different DELIMITER (one that does not occur in the import data at all), and import to a temporary staging table with a single text column:

CREATE TEMP TABLE tmp_data (raw text);

COPY tmp_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER '§');

This query creates the CREATE TABLE script:

SELECT 'CREATE TABLE tbl (col' || replace (raw, E'\t', ' bool, col') || ' bool)'
FROM   (SELECT raw FROM tmp_data LIMIT 1) t;

A more generic & safer query:

SELECT 'CREATE TABLE tbl('
    ||  string_agg(quote_ident('col' || col), ' bool, ' ORDER  BY ord)
    || ' bool);'
FROM  (SELECT raw FROM tmp_data LIMIT 1) t
     , unnest(string_to_array(t.raw, E'\t')) WITH ORDINALITY c(col, ord);

Returns:

CREATE TABLE tbl (col1 bool, col2 bool, col3 bool, col4 bool);

Execute after verifying validity - or execute dynamically if you trust the result:

DO
$$BEGIN
EXECUTE (
   SELECT 'CREATE TABLE tbl (col' || replace(raw, ' ', ' bool, col') || ' bool)'
   FROM  (SELECT raw FROM tmp_data LIMIT 1) t
   );
END$$;

Then INSERT the data with this query:

INSERT INTO tbl
SELECT (('(' || replace(replace(replace(
                  raw
                , '1',   't')
                , '0',   'f')
                , E'\t', ',')
             || ')')::tbl).*
FROM   (SELECT raw FROM tmp_data OFFSET 1) t;

Or simpler with translate():

INSERT INTO tbl
SELECT (('(' || translate(raw, E'10\t', 'tf,') || ')')::tbl).*
FROM   (SELECT raw FROM tmp_data OFFSET 1) t;

The string is converted into a row literal, cast to the newly created table row type and decomposed with (row).*.

All done.

You could put all of that into a plpgsql function, but you'd need to safeguard against SQL injection. (There are a number of related solutions here on SO. Try a search.

db<>fiddle here
Old SQL Fiddle

Yea answered 3/5, 2013 at 21:8 Comment(6)
Thanks for the quick reply. Yeah the data table is actually an 800x1000 tab delimited file. I've looked at both approaches you mentioned but I'd like the ability to select specific columns when the table is uploaded to the db instead of having to parse out rows into some table and then select them, so perhaps a semi automated approach to generating the columns would have to be considered.Slapdash
@dnak: I added a complete solution.Yea
using the copy into tmp_data on postgres9.2 i get ERROR: COPY delimiter must be a single one-byte character...Gilus
@HugoKoopmans: one-byte character means you can't use a multi-byte character. Check with octet_length(): http://sqlfiddle.com/#!15/d41d8/2406Yea
@Erwin so why does your example then fail in my pg 9.2 as i describe above?Gilus
@HugoKoopmans: Hard to tell. Some typo maybe? I suggest you start a new question with all the necessary details. Ideally, provide a test case. You can always link to this one for context.Yea
G
1

you can create the table from the copy command directly, check out the HEADER option in COPY like: COPY FROM '/path/to/csv/SourceCSVFile.csv' DELIMITERS ',' CSV HEADER

Gilus answered 7/6, 2013 at 11:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.