Cassandra control SSTable size
Asked Answered
S

1

10

Is there a way I could control max size of a SSTable, for example 100 MB so that when there is actually more than 100MB of data for a CF, then Cassandra creates next SSTable?

Smidgen answered 1/4, 2015 at 13:30 Comment(0)
C
14

Unfortunately the answer is not so simple, the sizes of your SSTables will be influenced by your compaction Strategy and there is no direct way to control your max sstable size.

SSTables are initially created when memtables are flushed to disk as SSTables. The size of these tables initially depends on your memtable settings and the size of your heap (memtable_total_space_in_mb being a large influencer). Typically these SSTables are pretty small. SSTables get merged together as part of a process called compaction.

If you use Size-Tiered Compaction Strategy you have an opportunity to have really large SSTables. STCS will combine SSTables in a minor compaction when there are at least min_threshold (default 4) sstables of the same size by combining them into one file, expiring data and merging keys. This has the possibility to create very large SSTables after a while.

Using Leveled Compaction Strategy there is a sstable_size_in_mb option that controls a target size for SSTables. In general SSTables will be less than or equal to this size unless you have a partition key with a lot of data ('wide rows').

I haven't experimented much with Date-Tiered Compaction Strategy yet, but that works similar to STCS in that it merges files of the same size, but it keeps data together in time order and it has a configuration to stop compacting old data (max_sstable_age_days) which could be interesting.

The key is to find the compaction strategy which works best for your data and then tune the properties around what works best for your data model / environment.

You can read more about the configuration settings for compaction here and read this guide to help understand whether STCS or LCS is appropriate for you.

Connacht answered 1/4, 2015 at 13:56 Comment(5)
Should also add: There is a happy medium when it comes to SSTable size, you don't want your SSTables to be too small either, as it creates a lot of reads to get your data as it makes it more likely for rows to be spread among SSTables. How large your sstables should acceptably be may depend on your environment and your requirements, so it's probably good to tune and test what works best for you.Connacht
Thanks for the info. Where do I specify sstable_size_in_mb ? I tried putting it like sstable_size_in_mb: 40 in the conf/cassandra.yaml but cassandra startup failed with error org.apache.cassandra.exceptions.ConfigurationException: Invalid yaml. Please remove properties [sstable_size_in_mb] from your cassandra.yamlSmidgen
@RRMadhav, chances are that your table is still using SizeTieredCompactionStrategy. This option is only supported with LeveledCompactionStrategy, you can change your compaction strategy with the following CQL command: ALTER TABLE tablename WITH compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 40 }. I'd recommend using the default sstable size of 160MB as this is what the cassandra team has found as being most ideal and speaking from experience, having a lot of tiny SSTables is not good for read performance.Connacht
No, I created the table with compaction={'class': 'LeveledCompactionStrategy'} AND ...and in the desc table , it is showing the same: compaction={'class': 'LeveledCompactionStrategy'} ANDSmidgen
Oh i see, sstable_size_in_mb does not go in your cassandra.yaml, it's part of your table configuration.Connacht

© 2022 - 2024 — McMap. All rights reserved.