/plushcap/analysis/datastax/datastax-leveled-compaction-apache-cassandra

Leveled Compaction in Apache Cassandra

What's this blog post about?

Cassandra's log-structured storage engine enables its performance and features like application-transparent compression by turning all updates into data files called sstables that are written sequentially to disk. Over time, multiple versions of a row may exist in different sstables with varying sets of columns. To prevent read speed from deteriorating, compaction runs in the background, merging sstables together. Cassandra's size-tiered compaction strategy is similar to Google's Bigtable paper and combines sstables when enough similar-sized ones are present. However, this approach has issues with update-heavy workloads. Cassandra 1.0 introduces the Leveled Compaction Strategy, based on LevelDB from Google's Chromium team. This strategy creates fixed-size sstables grouped into levels, ensuring non-overlapping sstables within each level. Each level is ten times as large as the previous. This approach solves problems with tiered compaction and can be enabled by setting the compaction_strategy option to LeveledCompactionStrategy. While leveled compaction performs roughly twice as much i/o compared to size-tiered compaction, it offers benefits for update-heavy workloads due to fewer obsolete row versions involved.

Company
DataStax

Date published
Oct. 10, 2011

Author(s)
Jonathan Ellis

Word count
568

Language
English

Hacker News points
None found.