/plushcap/analysis/datastax/advanced-repair-techniques

Advanced repair techniques

What's this blog post about?

Anti-entropy repair in Cassandra can be challenging due to its potential impact on disk IO and the need for running it before gc_grace expires. Reliable hints can help alleviate this issue, but if a node is lost, repair must still be performed. The process of repair involves two phases: building a Merkle tree and comparing differences between replicas. To mitigate disk IO issues during the first phase, compaction throttling can be used. However, using -pr option to repair only the primary range for a node may not fully resolve the problem as other replicas still have to perform the Merkle tree calculation. The -snapshot option can help by taking a snapshot of data and sequentially repairing from it, allowing only one replica at a time to perform validation compaction. Overstreaming is another issue that can occur during the second phase of repair when many partitions are sent due to fixed precision in Merkle trees. Subrange repair, available since Cassandra 1.1.11, allows repairing only a portion of data belonging to a node, effectively increasing precision and eliminating overstreaming behavior.

Company
DataStax

Date published
July 25, 2013

Author(s)
Brandon Williams

Word count
808

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.