how to unload only 1000 rows from redshift tables
Asked Answered
M

3

10

I am creating a copy of a production redshift database at a development level. I know how to unload data from my production instance/cluster to s3, then copy that data into my development instance/cluster, but only if I unload all the data at once. What I would like to do instead, is to copy just 1000 or so rows from each of my tables, to cut down on space and transfer time between my redshift instances.

e.g.

UNLOAD ('SELECT * FROM myschema.mytable LIMIT 1000') TO 's3://my-bucket' CREDENTIALS etcetcetc

Is there a way to do this LIMIT with UNLOAD, or am I going to have to switch to a bulk-insert-style paradigm?

EDIT: I am programmatically unloading and copying a bunch of tables, so I don't want to hard code in any key-based limits in case we add new tables or change table structures, etc.

Mitra answered 21/8, 2015 at 21:10 Comment(0)
V
16

While "LIMIT" is not part of the actual "UNLOAD" command, the Redshift documentation on UNLOAD provides a few alternatives:

Limit Clause

The SELECT query cannot use a LIMIT clause in the outer SELECT. For example, the following UNLOAD statement will fail:

unload ('select * from venue limit 10') 
to 's3://mybucket/venue_pipe_' credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'; 

Instead, use a nested LIMIT clause. For example:

unload ('select * from venue where venueid in 
(select venueid from venue order by venueid desc limit 10)') 
to 's3://mybucket/venue_pipe_' credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';

Alternatively, you could populate a table using SELECT…INTO or CREATE TABLE AS using a LIMIT clause, then unload from that table.

Versus answered 21/8, 2015 at 21:20 Comment(3)
Ah. I should have mentioned that I don't know anything about the tables I'm copying ahead of time, I'm generating a list of unload statements from a query provided in Amazon's redshift utils (github.com/awslabs/amazon-redshift-utils). This is a good start though, let me see if I can build on this...Mitra
I was able to accomplish what I was doing by creating temporary tables with only 1000 rows in them, then unloading those tables instead of the full tables. e.g. CREATE TEMPORARY TABLE temp_table_for_loading_things AS SELECT * FROM schema.table_to_copy LIMIT 1000; UNLOAD ('SELECT * FROM temp_table_for_loading_things') TO 's3://tmp_bucket' CREDENTIALS 'aws_access_key_id=KEY;aws_secret_access_key=PASS' GZIP DELIMITER '\t';Mitra
Alternatively you can also do unload('select temp.* from (select col1, col2 from venue order by venueid desc limit 10 as temp') to 's3://' ...Clear
G
5

It is possible to add the LIMIT clause when you unload from a Redshift table by adding SELECT * FROM () over your LIMIT query.

Example with your case :

UNLOAD ('SELECT * FROM myschema.mytable LIMIT 1000') 
TO 's3://my-bucket' CREDENTIALS etcetcetc

Becomes

UNLOAD ('SELECT * FROM 
                       (SELECT * FROM myschema.mytable LIMIT 1000)'
       ) 
TO 's3://my-bucket' CREDENTIALS etcetcetc
Gefen answered 28/4, 2022 at 9:39 Comment(0)
U
0

If your table is created with distribution style (except "All" distribution style), then there is no need of limit concept.

Say if you have created your table with distribution style "even" (which is the default distribution style) and you have 4 different child node, then while unloading, totally 4 files will be generated per table in Amazon S3.

Unprovided answered 24/8, 2015 at 10:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.