Redshift Error 1202 "Extra column(s) found" using COPY command

Asked 14/3, 2016 at 21:35 Answered 27/9, 2022 at 15:37

csv amazon-web-services amazon-s3 amazon-redshift

I'm getting a 1202 Extra column(s) found error in Redshift when trying to load a simple CSV. I've made sure that there are no additional columns nor any unescaped characters in the file that would cause the COPY command to fail with this error.

Here's the created target table:

create table test_table(
  name varchar(500),
  email varchar(500),
  developer_id integer,
  developer_name varchar(500),
  country varchar(20),
  devdatabase varchar(50));

I'm using a simple CSV with no header and only 3 rows of data:

john smith,[email protected],123,johndev,US,comet
jane smith,[email protected],124,janedev,GB,titan
jack smith,[email protected],125,jackdev,US,comet

Unfortunately my COPY command fails with err_1202 "Extra column(s) found".

COPY test_table 
FROM 's3://mybucket/test/test_contacts.csv'    
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
CSV;

There are no additional columns in the file.

Vendue answered 14/3, 2016 at 21:35 Comment(4)

I followed your steps and successfully imported the data into a Redshift table. I've cleaned your question (removed schema name, closed credentials quote, mentioned bucket name), so you might want to confirm that it still matches your situation. I saved the data as a text file in an S3 bucket (not zipped). – Helios 15/3, 2016 at 6:24

Sometimes names contain comma (,) you may need to go through your data and quote them – Nusku 17/3, 2016 at 1:6

Did you check the stl_error table or are you looking at the error message from your SQL client? SELECT err_reason,raw_line,err_code,query,session,tbl FROM stl_load_errors WHERE filename like 's3://mybucket/test/test_contacts%' ORDER BY query DESC, starttime DESC – Greiner 27/9, 2016 at 9:23

Change your delimiter to ~ if not, try to check to see if your table schema is correct when importing to your environment. – Kerwon 31/3, 2017 at 22:45

I was also facing the same issue while loading the data. i rectified using following codes :

copy yourtablename
from 'your S3 Locations'
credentials 'your AWS credentials' 
delimiter ',' IGNOREHEADER 1 
removequotes
emptyasnull
blanksasnull
maxerror 5;

Insignificancy answered 27/11, 2017 at 8:56 Comment(1)

this did not work for me – Mycah 23/2, 2023 at 2:18

Try this:

COPY test_table 
FROM 's3://mybucket/test/test_contacts.csv'    
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
delimiter ',' 
ignoreheader as 1 
emptyasnull
blanksasnull
removequotes
escape;

Source: https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#r_COPY_command_examples-copy-data-with-the-escape-option

Pereyra answered 3/7, 2018 at 22:2 Comment(0)

Make sure the correct delimiter is specified in the copy statement (and the source files). I run into the same issue. After a couple of attempts with different delimiters (while unloading table to s3 files, then copying into another table from the s3 files), I was able to solve the issue by using the delimiter '\t'. Here is the full example in my case:

copy <TABLE-NAME>
from 's3://<FILES/LOCATION>'
access_key_id '<INSERT>'
secret_access_key '<INSERT>'
delimiter '\t'
ignoreheader 1
maxerror 10;

Nap answered 12/11, 2020 at 3:55 Comment(0)

This mostly happens because you are using csv format which by default has ',' as delimiter. And in your data, there will be fields with values that contains ','. This causes the data to have extra columns when try to load to redshift. There are quite a few ways to fix this. It will be mostly easy once you have identified which which column has commas in their value. You can identify the columns by looking at the stl_load errors

SELECT starttime, err_reason,raw_line,err_code,query,session,tbl FROM stl_load_errors WHERE filename like 's3://mybucket/test/%' ORDER BY query DESC, starttime DESC

then fix the column where there are extra columns. let say in this example, 'name' column has extra commas. then lets clean that data

df = (df.withColumn('name', F.regexp_replace(F.col('name'), ',', ' '))
        )

Store the new dataframe in s3 and then use the below copy command to load to redshift

    COPY 'table_name'
FROM 's3 path'
IAM_ROLE 'iam role'
DELIMITER ','
ESCAPE
IGNOREHEADER 1
MAXERROR AS 5
COMPUPDATE FALSE
ACCEPTINVCHARS
ACCEPTANYDATE
FILLRECORD
EMPTYASNULL
BLANKSASNULL
NULL AS 'null';
END;

Nucellus answered 27/9, 2022 at 15:37 Comment(0)

notice glue is not as robust as one might think, column order plays a major role, check your table order as well as the table input, make sure the order and data types are identical, also see AWS Glue Developer Guide for more info

in addition, make sure you disabled 'Job bookmark' in the 'Job details' tab, for any development or generic job this is a major source of headache and troubles

Alithia answered 22/3, 2022 at 9:28 Comment(0)

-2

For me, it turned out to be that I executed the scripts on the wrong database within the cluster.

Borecole answered 30/12, 2020 at 8:37 Comment(1)

Not direktly related the problem in the question – Snooze 30/12, 2020 at 10:21

Recommended topics

Hot tags