Clearing the Air on Cassandra Batches
Batches in Apache Cassandra® are used to keep denormalized data in sync across multiple tables that contain similar data. A batch is an operation that takes a set of statements with common data and runs all statements as a group from a single coordinator node, providing a pass-fail condition based on the batch statement as a whole. Logged batches are the default option and provide stronger guarantees by replicating the batch log to 2 nodes and retrying in an aggressive manner. Unlogged batches do not use a batch log but still route all requests to the same coordinator as a single operation, offering network optimization at the expense of extra resource load on the coordinator node. Batches are not atomic in the traditional sense and have no rollback on failure, but they can provide a strong guarantee that all requests will complete. Understanding the impact and using batches correctly is crucial for optimizing performance and data consistency in Cassandra.
Company
DataStax
Date published
Oct. 28, 2020
Author(s)
Eric Zietlow
Word count
1576
Language
English
Hacker News points
None found.