Compaction Improvements in Cassandra 2.1
Cassandra's design to avoid in-place updates and use a commitlog and SSTables for data storage leads to linear disk usage patterns, minimizing seek times. However, this also results in storing the entire history of data changes, which can negatively impact performance when accessing that data. To address this issue, Cassandra's compaction routine reduces the history of changes down to a single set of most recent data per row. Despite this improvement, operating systems use page caches for frequently accessed files, and compaction destroys the original SSTable and creates a new one not in the cache yet. This leads to cache misses during read operations. The patch introduced in CASSANDRA-6916 improves performance by introducing incremental replacement of compacted SSTables, allowing data to be read directly from the new SSTable even before it finishes writing and gradually replacing the old one in the page cache. This results in predictable high performance for Cassandra under heavy load.
Company
DataStax
Date published
April 24, 2014
Author(s)
Ryan McGuire
Word count
592
Hacker News points
None found.
Language
English