Copying specific Columns in Amazon Redshift from S3 databucket

Asked 22/8, 2016 at 9:26 Answered 10/11, 2021 at 16:1

Solved amazon-web-services amazon-s3 amazon-redshift

I have a file in S3 with columns like

CustomerID   CustomerName   ProductID    ProductName   Price   Date

Now the existing SQL table structure in Redshift is like

Date  CustomerID   ProductID    Price

Is there a way to copy the selected data into the existing table structure? The S3 database doesn't have any headers, just the data in this order.

Viniculture answered 22/8, 2016 at 9:26 Comment(0)

This is for the case where the file has more columns than the target load table.

Assuming that CustomerName and ProductName can be NULL fields you have two options.

Load the data in a staging table. Then join the staging table with the reference data to insert data into

COPY staging-tablename
FROM 's3://<<YOUR-BUCKET>>/<<YOUR-FILE>>'
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret- access-key>'; 

INSERT INTO
 main_tablename
SELECT Date  
  ,CustomerID   
  ,ProductID    
  ,Price
FROM  staging-tablename st;

TRUNCATE TABLE staging-tablename;

ANALYZE main_tablename;

Marbling answered 22/8, 2016 at 18:29 Comment(1)

alter table append instead of insert/truncate – Katharynkathe 10/6, 2022 at 19:56

This is for the case where the file has fewer columns than the target load table.

Assuming that CustomerName and ProductName can be NULL fields you have two options.

Option #1 - Load Directly on the table

    COPY main_tablename
    (Date  
    ,CustomerID   
    ,ProductID    
    ,Price)
    FROM 's3://<<YOUR-BUCKET>>/<<YOUR-FILE>>'
    credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret- access-key>';

ANALYZE main_tablename;

Option #2 -- Load the data in a staging table. Then join the staging table with the reference data to insert data into

    COPY staging-tablename
    (Date  
    ,CustomerID   
    ,ProductID    
    ,Price)
    FROM 's3://<<YOUR-BUCKET>>/<<YOUR-FILE>>'
    credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret- access-key>'; 

 INSERT INTO
     main_tablename
SELECT st.CustomerID   
      ,cust.CustomerName   
      ,st.ProductID    
      ,prod.ProductName   
      ,st.Price   
      ,st.Date
FROM  staging-tablename st
INNER JOIN  customer-tablename cust ON ( cust.CustomerID = st.CustomerID)
INNER JOIN  product-tablename prod ON ( prod.ProductID  = st.ProductID );

TRUNCATE TABLE staging-tablename;

ANALYZE main_tablename;

Marbling answered 22/8, 2016 at 10:49 Comment(3)

It's the other way round: The target table has fewer columns than the source data at S3. The copy command will not work in this case because it encounters more columns in the source data than are available in the target table. – Benzene 22/8, 2016 at 17:51

I am sorry. You would have to use a staging table. I posted 2 different answers. one for each case. I am sorry I misread your question initially – Marbling 22/8, 2016 at 18:31

It wasn't my question; I just read through the thread and noticed it. :) – Benzene 22/8, 2016 at 18:35

This is for the case where the file has more columns than the target load table.

Assuming that CustomerName and ProductName can be NULL fields you have two options.

Load the data in a staging table. Then join the staging table with the reference data to insert data into

COPY staging-tablename
FROM 's3://<<YOUR-BUCKET>>/<<YOUR-FILE>>'
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret- access-key>'; 

INSERT INTO
 main_tablename
SELECT Date  
  ,CustomerID   
  ,ProductID    
  ,Price
FROM  staging-tablename st;

TRUNCATE TABLE staging-tablename;

ANALYZE main_tablename;

Marbling answered 22/8, 2016 at 18:29 Comment(1)

alter table append instead of insert/truncate – Katharynkathe 10/6, 2022 at 19:56

Yes, there's a way. What you can do is that you can create an external table having the same schema as your file (CustomerID CustomerName ProductID ProductName Price Date) with the S3 location of the file. Then you can run an Insert Into Query Selecting the specific columns as required from the external table.

Note : To create an external table in Redshift, you need to first create an external schema.

Inscrutable answered 10/11, 2021 at 16:1 Comment(2)

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Caylor 11/11, 2021 at 5:12

@Caylor - thanks for highlighting that. I have edited my answer and provide more details for the better understanding of all. – Inscrutable 9/6, 2022 at 10:44

Recommended topics

Hot tags