Clearing the Air on Cassandra Batches

Company

DataStax

Date Published

Oct. 28, 2020

Author

Eric Zietlow

Word count

1576

Language

English

Hacker News points

None

URL

www.datastax.com/blog/2020/10/clearing-air-cassandra-batches

Summary

Batches in Apache Cassandra™ are used to keep denormalized data in sync across multiple tables that contain similar data. A batch is an operation that takes a set of statements with common data and runs all statements as a group from a single coordinator node, ensuring a pass-fail condition for the entire batch. Logged batches provide stronger guarantees but come at an additional cost due to their extra steps and resource load. Unlogged batches can be used to optimize queries over a single partition in small numbers, reducing network traffic as they functionally become a single request from the driver to the coordinator. However, it is crucial to keep the number of requests small to avoid unbalanced load on the coordinator node. Batches are not atomic in the traditional sense and have no rollback on failure, but can provide strong guarantees for data synchronization when used correctly.