/plushcap/analysis/datastax/datastax-clearing-air-cassandra-batches

Clearing the Air on Cassandra Batches

What's this blog post about?

Batches in Apache Cassandra® are used to keep denormalized data in sync across multiple tables that contain similar data. A batch is an operation that takes a set of statements with common data and runs all statements as a group from a single coordinator node, providing a pass-fail condition based on the batch statement as a whole. Logged batches are the default option and provide stronger guarantees by replicating the batch log to 2 nodes and retrying in an aggressive manner. Unlogged batches do not use a batch log but still route all requests to the same coordinator as a single operation, offering network optimization at the expense of extra resource load on the coordinator node. Batches are not atomic in the traditional sense and have no rollback on failure, but they can provide a strong guarantee that all requests will complete. Understanding the impact and using batches correctly is crucial for optimizing performance and data consistency in Cassandra.

Company
DataStax

Date published
Oct. 28, 2020

Author(s)
Eric Zietlow

Word count
1576

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.