/plushcap/analysis/datastax/datastax-leveled-compaction-apache-cassandra

Leveled Compaction in Apache Cassandra

What's this blog post about?

Cassandra's log-structured storage engine enables its performance and features like application-transparent compression by turning all updates into data files called sstables that are written sequentially to disk. Over time, multiple versions of a row may exist in different sstables with varying sets of columns. To prevent read speed from deteriorating, compaction runs in the background, merging sstables together. Cassandra's size-tiered compaction strategy is similar to Google's Bigtable paper and combines sstables when enough similar-sized ones are present. However, this approach has issues with update-heavy workloads. Cassandra 1.0 introduces the Leveled Compaction Strategy, based on LevelDB from Google's Chromium team. This strategy creates fixed-size sstables grouped into levels, ensuring non-overlapping sstables within each level. Each level is ten times as large as the previous. This approach solves problems with tiered compaction and can be enabled by setting the compaction_strategy option to LeveledCompactionStrategy. While leveled compaction performs roughly twice as much i/o compared to size-tiered compaction, it offers benefits for update-heavy workloads due to fewer obsolete row versions involved.

Company
DataStax

Date published
Oct. 10, 2011

Author(s)
Jonathan Ellis

Word count
568

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.