Handling Disk Failures In Cassandra 1.2
The text discusses how Cassandra handles node failures and its robustness. Prior to version 1.2, a single unavailable disk could make an entire replica unresponsive due to issues with memtables and commitlog append. Traditional workarounds involved using RAID10 volumes, but this approach was becoming less feasible as data volumes increased. The upcoming Cassandra 1.2 release introduces a disk_failure_policy setting with two options: best_effort and stop. These policies allow for sensible handling of disk failure by either stopping the affected node or blacklisting the failed drive, depending on availability/consistency requirements. This improvement allows deploying Cassandra nodes with large disk arrays without the need for RAID10 overhead.
Company
DataStax
Date published
Oct. 11, 2012
Author(s)
Aleksey Yeschenko
Word count
237
Hacker News points
None found.
Language
English