Interpreting Cassandra repair logs and leveraging the OpsCenter repair service

Company

DataStax

Date Published

Dec. 1, 2015

Author

Sebastian Estevez

Word count

650

Language

English

Hacker News points

None

URL

www.datastax.com/blog/interpreting-cassandra-repair-logs-and-leveraging-opscenter-repair-service

Summary

Cassandra repairs involve comparing data between replica nodes, identifying inconsistencies, and streaming the latest values for mismatched data. Repairs are resource-intensive, requiring CPU to generate Merkle trees and networking/IO to stream missing data. OpsCenter's repair service splits up jobs into smaller tasks and runs them continuously, reducing manual workload and spikes in resource usage. Repair sessions are identified by a UUID, and logs from healthy repairs show only INFO messages with no WARN or ERROR. Repairs can fail due to networking issues or sstable corruptions, which may require admin intervention such as running nodetool scrub. Regularly scheduled repairs help maintain cluster health and prevent data inconsistencies.