What is an SSTable?
Asked Answered
I

5

156

In BigTable/GFS and Cassandra terminology, what is the definition of a SSTable?

Insuperable answered 4/4, 2010 at 21:46 Comment(1)
This is a great intro post to SSTables: igvita.com/2012/02/06/…Intorsion
O
152

Sorted Strings Table (borrowed from google) is a file of key/value string pairs, sorted by keys

Oasis answered 4/4, 2010 at 22:21 Comment(7)
Thanks for yet another excellent SO Cassandra answer! BTW, have you seen this question: #2573606Insuperable
Is it generally immutable?Daw
yes, sstables are immutable by design -- which is an awesome featureOasis
How can it both be sorted and immutable then?Sparker
@Sparker did you found the answer to this?Cowled
@Sparker The SSTable segment is immutable. The sorted behavior is designed on the in-memory level by leveraging a sorted data structure such as AVL tree. When you flush the in-memory on disk, you write it in sorted format. Once a segment is formed on disk, there is no way you can edit it hence it is immutable.Bridgman
Yes, the in-memory tree (i.e. "memtable") provides the sorted functionality. When the memtable reaches some threshold, it's written to disk as an SSTable segment. So by the time a segment is created, it's already sorted and immutable (since all new entries are written to the memtable). Lastly, there is usually a merging and compaction process to combine segments with overwritten/deleted values.Pentecostal
A
68

"An SSTable provides a persistent,ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Operations are provided to look up the value associated with a specified key, and to iterate over all key/value pairs in a specified key range. Internally, each SSTable contains a sequence of blocks (typically each block is 64KB in size, but this is configurable). A block index (stored at the end of the SSTable) is used to locate blocks; the index is loaded into memory when the SSTable is opened. A lookup can be performed with a single disk seek: we first find the appropriate block by performing a binary search in the in-memory index, and then reading the appropriate block from disk. Optionally, an SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk."

Aubree answered 16/11, 2010 at 12:1 Comment(4)
"without touching disk" -> "without being aware that the disk is being touched". Memory mapped IO is a very handy technique because it delegates the actual IO to the OS, assuming it can do a good job at caching (especially when several processes share the same file). But it has the disadvantage that you don't have control of it. If the page is not resident in memory, the thread will block and cannot perform other operations; contrast it with "async IO", where you can register a callback and do other stuff in the same thread, while the IO is pending.Parenteral
@ithkuil: You can absolutely have control of memory mapped IO at least to the point of being able to assure that certain pages are in memory or have been committed to disk (there is still wiggle room for pages that aren't guaranteed to be in memory but very well could be). That's what wondrous things like mlock(), msync(), and MAP_LOCKED are all about. You can also get an understanding of what currently is and isn't paged in through mincore().Bernete
@ChristopherSmith: yes you are right, there are ways to control it. However, usually it's used for critical performance sections (realtime) or security related issues (like avoiding that a in-memory password gets swapped on disk). Memory mapped files are very useful exactly because of the fact that you don't have to decide how much of them to keep in memory; otherwise you could just read the whole file in memory without mmap and achieve the same effect. In fact, I just grepped through the cassandra code; the only call is mlockall(MCL_CURRENT); done at startup. See also: goo.gl/AEgPMParenteral
The above quotation is from the BigTable paper.Mastermind
J
7
  • SSTable (engl. Sorted Strings Table) is a file of key/value string pairs, sorted by keys.

  • An SSTable provides a persistent,ordered immutable map from keys to values, where both keys and values are arbitrary byte strings.

  • Internally, each SSTable contains a sequence of blocks (typically
    each block is 64KB in size, but this is configurable).

Jud answered 3/7, 2015 at 17:9 Comment(0)
T
6

A tablet is stored in the form of SSTables.

SSTable (directly mapped to GFS) is key-value based immutable storage. It stores chunks of data, each is of 64KB.

Definitions:

  • Index of the keys: key and starting location
  • Chunk is a storage unit in GFS, replica management are by chunk
Trilbee answered 3/5, 2013 at 15:19 Comment(0)
K
1

SSTable means "sorted string table" based on key-value pair.In Cassandra, SSTables are immutable and sorted by keys.

Karole answered 6/3, 2021 at 12:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.