Optimizations around Cold SSTables
Cassandra's storage architecture utilizes large immutable files called SSTables, which are combined through compaction to evict obsolete data and enhance reading efficiency. In older versions, all SSTables were indexed at the same granularity, consuming resources proportional to the size of stored data. However, not all SSTables are equally important, especially in time series data models where recently written data is frequently read. Cassandra 2.0.2 introduced tracking of SSTable read rates, allowing manual tuning per table. In 2.0.3, improvements were made to size-tiered compaction based on this data, with automatic resource management added in 2.1. Two optimizations for handling cold SSTables were implemented: prioritizing the compaction of hottest SSTables and avoiding compacting cold SSTables altogether using a new compaction strategy option called 'cold_reads_to_omit'. Starting from Cassandra 2.1, this feature is enabled by default with a value of 0.05. Additionally, in 2.1, the memory usage for systems with many cold SSTables has been reduced by moving index summaries off-heap and resizing them periodically to fit within a fixed memory pool size.
Company
DataStax
Date published
Dec. 3, 2013
Author(s)
Tyler Hobbs
Word count
872
Hacker News points
None found.
Language
English