Clearing the Air on Cassandra Batches
Batches in Apache Cassandraâ„¢ are used to keep denormalized data in sync across multiple tables that contain similar data. A batch is an operation that takes a set of statements with common data and runs all statements as a group from a single coordinator node, ensuring a pass-fail condition for the entire batch. Logged batches provide stronger guarantees but come at an additional cost due to their extra steps and resource load. Unlogged batches can be used to optimize queries over a single partition in small numbers, reducing network traffic as they functionally become a single request from the driver to the coordinator. However, it is crucial to keep the number of requests small to avoid unbalanced load on the coordinator node. Batches are not atomic in the traditional sense and have no rollback on failure, but can provide strong guarantees for data synchronization when used correctly.
Company
DataStax
Date published
Oct. 28, 2020
Author(s)
Eric Zietlow
Word count
1576
Hacker News points
None found.
Language
English