AWS Redshift : DISTKEY / SORTKEY columns should be compressed?
Asked Answered
P

2

6

Let me ask something about column compression on AWS Redshift. Now we're verifying what can be made better performance using appropriate diststyle, sortkeys and column compression.

If my understanding is correct, the column compression can help to reduce IO cost. I tried "analyze compression table_name;". And mostly Redshift suggests to use 'zstd' or 'lzo' as compression method for our columns.

In general speaking, may I ask the columns set as DISTKEY/SORTKEY should be also compressed like other columns?

I'm totally new to Redshift and any advice would be appreciated.

Sincerly.

Pinelli answered 3/10, 2018 at 10:55 Comment(0)
R
16

DISTKEY can be compressed but the first SORTKEY column should be uncompressed (ENCODE raw). If you have multiple sort keys (compound) the other sort key columns can be compressed.

Also, generally recommend using a commonly filtered date/timestamp column (if one exists) as the first sort key column in a compound sort key.

Finally, if you are joining between very large tables try using the same dist and sort keys on both tables so Redshift can use a faster merge join.

Russo answered 3/10, 2018 at 12:52 Comment(7)
Joe, please can you site reference for using RAW with first sortkey? Thats a new one on me and I cannot find info on it.Polychromy
Explanation of the problems with a highly compressed sort key: github.com/awslabs/amazon-redshift-utils/blob/master/src/… Note that we announced late materialization last Dec. That helps with sort key skew but the general advice still stands: aws.amazon.com/about-aws/whats-new/2017/12/…Russo
Thank you Joe that is really useful!Polychromy
Thousand thanks for your great answer, @Joe Harris !! I'll try your recommendation right away. And thank you so much for your kind consideration to tell about what kind of data type should be first sort key. It's totally useful information for us :)Pinelli
Do you recommend the same for Interleaved Sort keys? Thanks!Radiant
No they are not affected but I do not recommend using Interleaved sort keys unless you have worked through this guide and are certain it's the right choice. aws.amazon.com/blogs/big-data/…Russo
FWIW, that AWS investigation link is comparing RunLength (highly compressed) vs Delta encodings, with Delta being faster. It doesn't seem to be recommending Raw encoding.Pinon
C
2

Complementary to Joe Harris's answer, I'll provide a deeper explanation of his sort key compression recommendation:

While Redshift is a columnar database for disk storage purposes, the sort and distribution key operate on full rows. To sort your data, the columns in your sort key are converted to 8 byte integers; then their bits are concatenated (compound sort style) or interleaved (interleaved sort style) to form a sortable value. This is all done on uncompressed values, so is unaffected by your choice of compression. In many cases the sort key gives the intuitive "ascending" sort order, but due to the intermediate 8 byte integer conversion, it can give unexpected results (see full details here); the 8 byte limit can also make the sort key less effective/useful if you are storing large values.

Redshift stores min/max 8 byte integer values for every block, which is viewable in STV_BLOCKLIST. However, these values are different than the full concatenated/interleaved sort key that Redshift is using to filter your data. The full values are instead stored in an internal zonemap, which, as far as I know, is inaccessible to end users. The zonemap structure is something like this:

Sortkey min Sortkey max Col0 min block Col0 max block Col1 min block Col1 max block ... up to ColN
0x001B... 0x001C... 5 10 0 2 ...
0x00A8 0x00FF... 12 15 1 4 ...

The zonemap is generated from the sort key's first column. For every block of that first sort key column, we get its min/max row. We determine the min/max blocks that intersect those rows, for each column. From these min/max blocks, plus the cached min/max 8 byte integers from STV_BLOCKLIST, we can assemble the full, composite sort key min/max values.

When a query filters by the sort key, it first converts the filter conditions to an equivalent composite sort key (e.g. concatentated/interleaved 8 byte integers). Redshift then will loop through entries of the zonemap, checking if the value falls between the min/max sort key. For those that do, it marks the blocks that need to be read for each column.

Since the zonemap is generated from the first sort key column, its effectiveness is tied to how many blocks that column takes up. The more blocks it has, the more entries will be in the zonemap, and so the more fine grained the filtering can be. For example, if the first sort key column is highly compressed down to 2 blocks, there will only be 2 entries in the zonemap. For your sort key to be maximally effective, the first sort key column must then occupy as many blocks as the maximum occupied by any column in the table. Setting the first sort key column to raw encoding will usually fulfill this condition.

Why Redshift behaves like this?

You would think Redshift could be more intelligent and index each column separately in its own zonemap. This would eliminate the dependency on the first sort key column's block count. However, it would require searching N zonemaps instead; and the Redshift creators have evidently determined that for the average Redshift user/workload, the extra filtering effectiveness is not worth that overhead.

You may also wonder why Redshift is generating the zonemap based on the first sort key column's blocks anyways, instead of say, one entry per 500 rows. Everything in Redshift is block based, e.g. caching min/max values for every block in STV_BLOCKLIST. They would instead need to store an STV_ROWCHUNKLIST of sorts, which stores min/max for chunks of 500 rows. This could improve filtering, but in some cases may require additional zonemap scanning if blocks contain > 500 rows. So once again I assume the Redshift creators have determined that for the average Redshift user/workload, the extra zonemap filtering overhead would not be worth it.

Claque answered 1/3 at 21:22 Comment(1)
I really appreciated all of your considerations. It really help us a lot, not onlly for us, but also the other who read this topic. Thank you so much.Pinelli

© 2022 - 2024 — McMap. All rights reserved.