Common Mistakes and Misconceptions

Company

DataStax

Date Published

Oct. 11, 2013

Author

Ben Coverston

Word count

1393

Language

English

Hacker News points

None

URL

www.datastax.com/blog/common-mistakes-and-misconceptions

Summary

The text discusses important operational aspects of running Apache Cassandra, an open-source distributed database management system. It covers the following key points: 1. Repair: This process ensures data consistency across replicas but can be expensive in terms of resources and latency. Running repair weekly is recommended. 2. Read_repair_chance: This setting controls how often Cassandra checks for inconsistencies between replicas during reads. The default value is 0.1, which means that 10% of requests will trigger a background read repair. 3. Cleanup: This process removes data no longer owned by a node after topology changes. It's recommended to schedule cleanup only when necessary and not at regular intervals. 4. Compaction: This optimization process merges rows in the background to reduce IO and CPU time for reads. Major compactions can exacerbate issues with tombstones and updates, so it's better to let compaction run its natural course. 5. JVM Heap Size: For optimal performance, Cassandra requires a small heap size (ideally less than 12 GB). Memory not allocated to the heap is utilized by Cassandra for memory-mapped IO, improving overall efficiency.