/plushcap/analysis/datastax/datastax-more-efficient-repairs-21

More Efficient Repairs in 2.1

What's this blog post about?

Repairs are crucial for maintaining data consistency in a Cassandra cluster, especially when frequently deleting data. The nodetool repair command initiates the repair process on a specific node by computing a Merkle tree for each range of data on that node. Incremental repairs have been introduced in Cassandra 2.1 to persist already repaired data and only calculate merkle trees for sstables that haven't previously undergone repairs, making the repair process more efficient as datasets grow. The incremental repair process involves sending a prepare message, building merkle trees from unrepaired sstables, comparing trees, issuing streaming requests, and finally an anticompaction command to segregate repaired and unrepaired ranges into separate sstables. Full repairs remain the default, but incremental repairs can be opted into via the -inc option to nodetool repair.

Company
DataStax

Date published
Feb. 27, 2014

Author(s)
Lyuben Todorov

Word count
612

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.