How does the Leveled Compaction Strategy ensure 90% of reads are from one sstable

LeveledCompactionStrategy (LCS) in Cassandra implements the internals of LevelDB. You can check the exact implementation details in LevelDB implementation doc.

In order to give you a simple explanation take into account the following points:

Every SSTable is created when a fixed (relatively small) size limit is reached. By default L0 gets 5MB files of files, and each subsequent level is 10x the size. (in L1 you'll have 50MB of data, L2 500MB, and so on).
SSTables are created with the guarantee that they don't overlap
When a level fills up, a compaction is triggered and stables from level-L are promoted to level-L+1. So, in L1 you'll have 50MB in ~10 files, L2 500MB in ~100 files, etc..

In bold are the relevant details that justify the 90% reads from the same file (SSTable). Let's do the math together and everything will become clearer.

Imagine you have keys A,B,C,D,E in L0, and each keys takes 1MB of data.

Next we insert key F. Because level 0 is filled a compaction will create a file with [A,B,C,D,E] in level 1, and F will remain in level 0.

That's ~83% of data in 1 file in L1.

Next we insert G,H,I,J and K. So L0 fills up again, L1 gets a new sstable with [I,G,H,I,J].

By now we have K in L0, [A,B,C,D,E] and [F,G,H,I,J] in L1

And that's ~90% of data in L1.

If we continue inserting keys we will get around the same behavior so, that's why you get 90% of reads served from roughly the same file/SSTable.

A more in depth and detailed (what happens with updates and tombstones) info is given in this paragraph on the link I mentioned (the sizes for compaction election are different because they are LevelDB defaults, not C*s):

When the size of level L exceeds its limit, we compact it in a background thread. The compaction picks a file from level L and all overlapping files from the next level L+1. Note that if a level-L file overlaps only part of a level-(L+1) file, the entire file at level-(L+1) is used as an input to the compaction and will be discarded after the compaction. Aside: because level-0 is special (files in it may overlap each other), we treat compactions from level-0 to level-1 specially: a level-0 compaction may pick more than one level-0 file in case some of these files overlap each other.

A compaction merges the contents of the picked files to produce a sequence of level-(L+1) files. We switch to producing a new level-(L+1) file after the current output file has reached the target file size (2MB). We also switch to a new output file when the key range of the current output file has grown enough to overlap more then ten level-(L+2) files. This last rule ensures that a later compaction of a level-(L+1) file will not pick up too much data from level-(L+2).

Recommended topics

Hot tags