How to safely unload/copy a table in RedShift?
Asked Answered
R

4

13

In RedShift, it is convenient to use unload/copy to move data to S3 and load back to redshift, but I feel it is hard to choose the delimiter each time. The right delimiter is relevant to the content of the table! I had to change the delimiter each time I met load errors.

For example, when I use the following command to unload/copy a table:

unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes allowoverwrite;

copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes;

I will get load error if the table happens to have a field with its content as "||". Then I have to change the delimiter '|' to another one like ',' and try again, if I'm unlucky, maybe it takes multiple tries to get a success.

I'm wondering if there's a way to unload/copy a redshift table which is irrelevant to the content of the table, which will always succeed no mater what weird strings are stored in the table.

Renfrew answered 16/10, 2014 at 17:52 Comment(1)
It is better not to load free text fields at all. Why do you want fields like "comment" or "description" in an analytical DB at all. It will take a lot of your disk space and give little benefit for your queries.Latitudinarian
R
36

Finally I figured out the right approach, to add escape in both unload and copy command:

unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes escape allowoverwrite;

copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes escape;

With escape in unload command, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:

  • Linefeed: \n
  • Carriage return: \r
  • The delimiter character specified for the unloaded data.
  • The escape character: \
  • A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified in the UNLOAD command).

And with escape in copy command, the backslash character () in input data is treated as an escape character. The character that immediately follows the backslash character is loaded into the table as part of the current column value, even if it is a character that normally serves a special purpose. For example, you can use this option to escape the delimiter character, a quote, an embedded newline, or the escape character itself when any of these characters is a legitimate part of a column value.

Renfrew answered 17/10, 2014 at 16:42 Comment(1)
I am using the same commands as mentioned but unfortunately getting an error. My unload command creates a file whose line looks like ``"51"|"2017-11-13 15:53:21"|"02...` second column is my date column but whenever I run the copy command, its reading 51, i.e., the first column as the second column. I can't understand what the issue could beBourgeois
A
1

Try unload like below

 unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter as ',' addquotes escape

To load it back use as below

copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter ',' removequotes escape;

This will work irrespective of your data might have , in between.

Albie answered 17/10, 2014 at 7:13 Comment(0)
B
0

This was a very old thread, but I guess the best solution here is to unload and copy using the Parquet format.

It will try to preserve the formatting for each column that you have, and will correspond it to the most compatible data type it supports.

Example:

UNLOAD ('QUERY HERE')
TO 's3://your-path-here'
IAM_ROLE 'xxxxxxxx'
FORMAT AS PARQUET;

For the copy, use this:

COPY my_table_name
FROM 's3://your-path-here'
IAM_ROLE 'xxxxxxxx'
FORMAT AS PARQUET;
Bastardize answered 9/3, 2023 at 0:56 Comment(0)
T
-1

Since this topic comes up in many places we decided to package up the UNLOAd/extract process into a Docker service. All the code is on Github so you can use it as-is or grab the underlying Python code to create your own version: https://github.com/openbridge/ob_redshift_unload

You can set the delimiter, dates and ad hoc SQL via run-time configuration. This will also export a header row as well, something that is a little more complicated to undertake.

Here are a few of the runtime options:

-t: The table you wish to UNLOAD
-f: The S3 key at which the file will be placed
-s (Optional): The file you wish to read a custom valid SQL WHERE clause from. This will be sanitized then inserted into the UNLOAD command.
-r (Optional): The range column you wish to use to constrain the results. Any type supported by Redshift's BETWEEN function is accepted here (date, integer, etc.)
-r1 (Optional): The desired start range to constrain the result set
-r2 (Optional): The desired end range to constrain the result set

Note: -s and -d are mutually exlusive and cannot be used together. If neither is used, the script will default to not specifying a WHERE clause and output the entire table.

Then you can run it like this to UNLOAD:

docker run -it -v /local/path/to/my/config.json:/config.json openbridge/ob_redshift_unload python /unload.py -t mytable -f s3://dest-bucket/foo/bar/output_file.csv -r datecol -r1 2017-01-01 -r2 2017-06-01

The goal was to enhance the default UNLOAD process and wrap it into something that can help ensure consistency in generating outputs.

Here is a write-up that details the features/capabilities: https://blog.openbridge.com/how-to-easily-extract-data-from-amazon-redshift-4e55435f7003

Towline answered 14/8, 2017 at 13:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.