How to add a sort key to an existing table in AWS Redshift
Asked Answered
B

9

41

In AWS Redshift, I want to add a sort key to a table that is already created. Is there any command which can add a column and use it as sort key?

Berni answered 26/7, 2013 at 14:35 Comment(1)
Updated correct answer is below: https://mcmap.net/q/385785/-how-to-add-a-sort-key-to-an-existing-table-in-aws-redshiftTootle
S
34

UPDATE:

Amazon Redshift now enables users to add and change sort keys of existing Redshift tables without having to re-create the table. The new capability simplifies user experience in maintaining the optimal sort order in Redshift to achieve high performance as their query patterns evolve and do it without interrupting the access to the tables.

source: https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-redshift-supports-changing-table-sort-keys-dynamically/

At the moment I think its not possible (hopefully that will change in the future). In the past when I ran into this kind of situation I created a new table and copied the data from the old one into it.

from http://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html:

ADD [ COLUMN ] column_name Adds a column with the specified name to the table. You can add only one column in each ALTER TABLE statement.

You cannot add a column that is the distribution key (DISTKEY) or a sort key (SORTKEY) of the table.

You cannot use an ALTER TABLE ADD COLUMN command to modify the following table and column attributes:

UNIQUE

PRIMARY KEY

REFERENCES (foreign key)

IDENTITY

The maximum column name length is 127 characters; longer names are truncated to 127 characters. The maximum number of columns you can define in a single table is 1,600.

Silique answered 27/7, 2013 at 16:40 Comment(2)
Updated correct answer is below: https://mcmap.net/q/385785/-how-to-add-a-sort-key-to-an-existing-table-in-aws-redshiftTootle
This ALTER TABLE still not working for INTERLEAVED sortkeys, only apply for COMPOUND ones (Feb 2022). To change INTERLEAVED you have to create a new table and copy data over.Bravissimo
D
32

To add to Yaniv's answer, the ideal way to do this is probably using the CREATE TABLE AS command. You can specify the distkey and sortkey explicitly. I.e.

CREATE TABLE test_table_with_dist 
distkey(field) 
sortkey(sortfield) 
AS 
select * from test_table

Additional examples:

http://docs.aws.amazon.com/redshift/latest/dg/r_CTAS_examples.html

EDIT

I've noticed that this method doesn't preserve encoding. Redshift only automatically encodes during a copy statement. If this is a persistent table you should redefine the table and specify the encoding.

create table test_table_with_dist(
    field1 varchar encode row distkey
    field2 timestam pencode delta sortkey);

insert into test_table select * from test_table;

You can figure out which encoding to use by running analyze compression test_table;

Decent answered 15/4, 2015 at 17:30 Comment(0)
S
31

As Yaniv Kessler mentioned, it's not possible to add or change distkey and sort key after creating a table, and you have to recreate a table and copy all data to the new table. You can use the following SQL format to recreate a table with a new design.

ALTER TABLE test_table RENAME TO old_test_table;
CREATE TABLE new_test_table([new table columns]);
INSERT INTO new_test_table (SELECT * FROM old_test_table);
ALTER TABLE new_test_table RENAME TO test_table;
DROP TABLE old_test_table;

In my experience, this SQL is used for not only changing distkey and sortkey, but also setting the encoding(compression) type.

Sarilda answered 21/11, 2013 at 7:27 Comment(5)
This is the best answer. I might add that it could be safer to do the table rename after the operation is complete.Backtrack
This is pretty late, but I'm in the same situation and I did some digging to what the performance for this type of operation would be, assuming the table is large. Redshift docs mention that INSERT INTO should be used with caution, favoring the COPY or CREATE TABLE AS commands. The examples for CTAS mention this problem explicitly.Decent
The problem with this method is that if you have views depending on the original table, your drop will not work.Raptor
Why do you alter table twice? Isn't test_table name already free after first alter table, so we could create table with this name?Whiteman
ALTER TABLE test_table RENAME TO old_test_table; CREATE TABLE test_table([new table columns]); INSERT INTO test_table (SELECT * FROM old_test_table); DROP TABLE old_test_table; ;)Yancey
L
21

AWS now allows you to add both sortkeys and distkeys without having to recreate tables:

TO add a sortkey (or alter a sortkey):

ALTER TABLE data.engagements_bot_free_raw ALTER SORTKEY (id)

To alter a distkey or add a distkey:

ALTER TABLE data.engagements_bot_free_raw ALTER DISTKEY id

Interestingly, the paranthesis are mandatory on SORTKEY, but not on DISTKEY.

You still cannot inplace change the encoding of a table - that still requires the solutions where you must recreate tables.

Lajuanalake answered 30/10, 2019 at 22:37 Comment(0)
R
1

I followed this approach for adding the sort columns to my table table_transactons its more or less same approach only less number of commands.

alter table table_transactions rename to table_transactions_backup;
create table table_transactions compound sortkey(key1, key2, key3, key4) as select * from table_transactions_backup;
drop table table_transactions_backup;
Rybinsk answered 24/1, 2018 at 6:10 Comment(0)
L
1

Catching this query a bit late.
I find that using 1=1 the best way to create and replicate data into another table in redshift eg: CREATE TABLE NEWTABLE AS SELECT * FROM OLDTABLE WHERE 1=1;

then you can drop the OLDTABLE after verifying that the data has been copied

(if you replace 1=1 with 1=2, it copies only the structure - which is good for creating staging tables)

Loredo answered 31/10, 2019 at 3:10 Comment(0)
B
1

it is now possible to alter a sort kay:

Amazon Redshift now supports changing table sort keys dynamically

Amazon Redshift now enables users to add and change sort keys of existing Redshift tables without having to re-create the table. The new capability simplifies user experience in maintaining the optimal sort order in Redshift to achieve high performance as their query patterns evolve and do it without interrupting the access to the tables.

Customers when creating Redshift tables can optionally specify one or more table columns as sort keys. The sort keys are used to maintain the sort order of the Redshift tables and allows the query engine to achieve high performance by reducing the amount of data to read from disk and to save on storage with better compression. Currently Redshift customers who desire to change the sort keys after the initial table creation will need to re-create the table with new sort key definitions.

With the new ALTER SORT KEY command, users can dynamically change the Redshift table sort keys as needed. Redshift will take care of adjusting data layout behind the scenes and table remains available for users to query. Users can modify sort keys for a given table as many times as needed and they can alter sort keys for multiple tables simultaneously.

For more information ALTER SORT KEY, please refer to the documentation.

documentation

as for the documentation itself:

ALTER DISTKEY column_name or ALTER DISTSTYLE KEY DISTKEY column_name A clause that changes the column used as the distribution key of a table. Consider the following:

VACUUM and ALTER DISTKEY cannot run concurrently on the same table.

If VACUUM is already running, then ALTER DISTKEY returns an error.

If ALTER DISTKEY is running, then background vacuum doesn't start on a table.

If ALTER DISTKEY is running, then foreground vacuum returns an error.

You can only run one ALTER DISTKEY command on a table at a time.

The ALTER DISTKEY command is not supported for tables with interleaved sort keys.

When specifying DISTSTYLE KEY, the data is distributed by the values in the DISTKEY column. For more information about DISTSTYLE, see CREATE TABLE.

ALTER [COMPOUND] SORTKEY ( column_name [,...] ) A clause that changes or adds the sort key used for a table. Consider the following:

You can define a maximum of 400 columns for a sort key per table.

You can only alter a compound sort key. You can't alter an interleaved sort key.

When data is loaded into a table, the data is loaded in the order of the sort key. When you alter the sort key, Amazon Redshift reorders the data. For more information about SORTKEY, see CREATE TABLE.
Bortman answered 12/12, 2019 at 12:25 Comment(0)
E
0

According to the updated documentation it is now possible to change a sort key type with:

ALTER [COMPOUND] SORTKEY ( column_name [,...] )

For reference (https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html):

  • "You can alter an interleaved sort key to a compound sort key or no sort key. However, you can't alter a compound sort key to an interleaved sort key."
Expressivity answered 23/11, 2021 at 0:30 Comment(0)
M
0

ALTER TABLE table_name ALTER SORTKEY (sortKey1, sortKey2 ...etc)

Mckoy answered 7/7, 2022 at 16:22 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Aerial

© 2022 - 2024 — McMap. All rights reserved.