More Efficient Repairs in 2.1
Repairs are crucial for maintaining data consistency in a Cassandra cluster, especially when frequently deleting data. The nodetool repair command initiates the repair process on a specific node by computing a Merkle tree for each range of data on that node. Incremental repairs have been introduced in Cassandra 2.1 to persist already repaired data and only calculate merkle trees for sstables that haven't previously undergone repairs, making the repair process more efficient as datasets grow. The incremental repair process involves sending a prepare message, building merkle trees from unrepaired sstables, comparing trees, issuing streaming requests, and finally an anticompaction command to segregate repaired and unrepaired ranges into separate sstables. Full repairs remain the default, but incremental repairs can be opted into via the -inc option to nodetool repair.
Company
DataStax
Date published
Feb. 27, 2014
Author(s)
Lyuben Todorov
Word count
612
Language
English
Hacker News points
None found.