Handling Disk Failures In Cassandra 1.2

Company

DataStax

Date Published

Oct. 11, 2012

Author

Aleksey Yeschenko

Word count

237

Language

English

Hacker News points

None

URL

www.datastax.com/blog/handling-disk-failures-cassandra-12

Summary

The text discusses how Cassandra handles node failures and its robustness. Prior to version 1.2, a single unavailable disk could make an entire replica unresponsive due to issues with memtables and commitlog append. Traditional workarounds involved using RAID10 volumes, but this approach was becoming less feasible as data volumes increased. The upcoming Cassandra 1.2 release introduces a disk_failure_policy setting with two options: best_effort and stop. These policies allow for sensible handling of disk failure by either stopping the affected node or blacklisting the failed drive, depending on availability/consistency requirements. This improvement allows deploying Cassandra nodes with large disk arrays without the need for RAID10 overhead.