AWS Datapipeline RedShiftCopyActivity - how to specify "columns"
Asked Answered
L

1

9

I am trying to copy a bunch of csv files from S3 to Redshift using the RedShiftCopyActivity and a datapipeline.

This works fine as long as the csv structure matches the table structure. In my case the csv has less columns than the table and then the RedShiftCopyActivity failes with a "Delimiter not found" error in stl_load_errors.

I would like to use the redshift copy command "columns" option. That way I can make it work, but the columns part of the redshift copy command does not seem to be available in the RedShiftCopyActivity.

Anyone any suggestions?

All tips warmly welcomed.

Many thanks upfront.

Peter

Landtag answered 4/12, 2014 at 14:4 Comment(4)
Did you ever find a solution to this problem?Varrian
No solution. We were able to avoid the issue by developing a python script that is started by the pipeline and that invokes the redshift copy command. Less elegant than I'd like but at least it works.Landtag
Thanks @Peter. As it happens, I've just written a Python script to replace the whole Data Pipeline. Had endless issues with the redshiftCopyActivity and the Data Pipeline console.Varrian
The transformSql option for the RedshiftCopyActivity states the following: "The SQL SELECT expression used to transform the input data." Further, it says that data is loaded into a table named staging, which your transforming SELECT statements should be run against, and that the output schema of transformSql must match the final target table's schema. Perhaps this is the option you were looking for?Spout
I
1

I know this is an old question but now you can specify a list of columns to the Redshift COPY command.

COPY tablename (column1 [,column2, ...]) 

When loading data from S3, the column order needs to match the order of the source data. Check out the docs here: Amazon Redshift Column Mapping Options.

Radu

Iscariot answered 4/4, 2016 at 8:29 Comment(2)
Thanks for letting me know Radu. We gave up on DataPipeLine and wrote a python script. For now running on an ec2 instance, but I guess we can consider moving to lambda.Landtag
There is a really good blog post on Amazon Redshift and Lambda. We already use this and works like a charm. You should check it out: A Zero-Administration Amazon Redshift Database LoaderIscariot

© 2022 - 2024 — McMap. All rights reserved.