Amazon Redshift - Unload to S3 - Dynamic S3 file name
Asked Answered
G

3

3

I have been using UNLOAD statement in Redshift for a while now, it makes it easier to dump the file to S3 and then allow people to analysie.

The time has come to try to automate it. We have Amazon Data Pipeline running for several tasks and I wanted to run SQLActivity to execute UNLOAD automatically. I use SQL script hosted in S3.

The query itself is correct but what I have been trying to figure out is how can I dynamically assign the name of the file. For example:

UNLOAD('<the_query>')
TO 's3://my-bucket/' || to_char(current_date)
WITH CREDENTIALS '<credentials>'
ALLOWOVERWRITE
PARALLEL OFF

doesn't work and of course I suspect that you can't execute functions (to_char) in the "TO" line. Is there any other way I can do it?

And if UNLOAD is not the way, do I have any other options how to automate such tasks with current available infrastructure (Redshift + S3 + Data Pipeline, our Amazon EMR is not active yet).

The only thing that I thought could work (but not sure) is not instead of using script, to copy the script into the Script option in SQLActivity (at the moment it points to a file) and reference {@ScheduleStartTime}

Glendon answered 14/4, 2016 at 7:47 Comment(0)
A
2

Why not use RedshiftCopyActivity to copy from Redshift to S3? Input is RedshiftDataNode and output is S3DataNode where you can specify expression for directoryPath.

You can also specify the transformSql property in RedshiftCopyActivity to override the default value of : select * from + inputRedshiftTable.

Sample pipeline:

{ "objects": [{ "id": "CSVId1", "name": "DefaultCSV1", "type": "CSV" }, { "id": "RedshiftDatabaseId1", "databaseName": "dbname", "username": "user", "name": "DefaultRedshiftDatabase1", "*password": "password", "type": "RedshiftDatabase", "clusterId": "redshiftclusterId" }, { "id": "Default", "scheduleType": "timeseries", "failureAndRerunMode": "CASCADE", "name": "Default", "role": "DataPipelineDefaultRole", "resourceRole": "DataPipelineDefaultResourceRole" }, { "id": "RedshiftDataNodeId1", "schedule": { "ref": "ScheduleId1" }, "tableName": "orders", "name": "DefaultRedshiftDataNode1", "type": "RedshiftDataNode", "database": { "ref": "RedshiftDatabaseId1" } }, { "id": "Ec2ResourceId1", "schedule": { "ref": "ScheduleId1" }, "securityGroups": "MySecurityGroup", "name": "DefaultEc2Resource1", "role": "DataPipelineDefaultRole", "logUri": "s3://myLogs", "resourceRole": "DataPipelineDefaultResourceRole", "type": "Ec2Resource" }, { "myComment": "This object is used to control the task schedule.", "id": "DefaultSchedule1", "name": "RunOnce", "occurrences": "1", "period": "1 Day", "type": "Schedule", "startAt": "FIRST_ACTIVATION_DATE_TIME" }, { "id": "S3DataNodeId1", "schedule": { "ref": "ScheduleId1" }, "directoryPath": "s3://my-bucket/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}", "name": "DefaultS3DataNode1", "dataFormat": { "ref": "CSVId1" }, "type": "S3DataNode" }, { "id": "RedshiftCopyActivityId1", "output": { "ref": "S3DataNodeId1" }, "input": { "ref": "RedshiftDataNodeId1" }, "schedule": { "ref": "ScheduleId1" }, "name": "DefaultRedshiftCopyActivity1", "runsOn": { "ref": "Ec2ResourceId1" }, "type": "RedshiftCopyActivity" }] }

Archidiaconal answered 19/4, 2016 at 18:4 Comment(1)
How could this unload data in partitions based on a column value? With the column name in the file path, say writing into file paths partitioned by transaction date or similar?Iodide
C
0

Are you able to SSH into the cluster? If so, I would suggest writing a shell script where you can create variables and whatnot, then pass in those variables into a connection's statement-query

Crossgrained answered 20/4, 2016 at 23:44 Comment(0)
R
-1

By using a redshift procedural wrapper around unload statement and dynamically deriving the s3 path name.

Execute the dynamic query and in your job, call the procedure that dynamically creates the UNLOAD statement and executes the statement.

This way you can avoid the other services. But depends on what kind of usecase you are working on.

Rhetorical answered 16/4, 2021 at 13:14 Comment(1)
Do you have a link to documentation?Headwind

© 2022 - 2024 — McMap. All rights reserved.