/plushcap/analysis/datastax/2020-10-clearing-air-cassandra-batches

Clearing the Air on Cassandra Batches

What's this blog post about?

Batches in Apache Cassandraâ„¢ are used to keep denormalized data in sync across multiple tables that contain similar data. A batch is an operation that takes a set of statements with common data and runs all statements as a group from a single coordinator node, ensuring a pass-fail condition for the entire batch. Logged batches provide stronger guarantees but come at an additional cost due to their extra steps and resource load. Unlogged batches can be used to optimize queries over a single partition in small numbers, reducing network traffic as they functionally become a single request from the driver to the coordinator. However, it is crucial to keep the number of requests small to avoid unbalanced load on the coordinator node. Batches are not atomic in the traditional sense and have no rollback on failure, but can provide strong guarantees for data synchronization when used correctly.

Company
DataStax

Date published
Oct. 28, 2020

Author(s)
Eric Zietlow

Word count
1576

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.