/plushcap/analysis/cockroach-labs/change-data-capture-fine-tuning-changefeeds-for-performance-and-durability

Change Data Capture: Fine Tuning Changefeeds for Performance and Durability

What's this blog post about?

Change Data Capture (CDC) is a powerful feature of CockroachDB that can impact the performance of a cluster. However, its highly configurable nature allows users to balance additional performance costs such as CPU usage and SQL latency to achieve desired changefeed behavior. This blog outlines various tradeoffs one can make while using CDC. Changefeeds are CockroachDB's distributed change data capture mechanism that work by running on every node, sending messages from a node to a sink endpoint (like Kafka) as rows are updated/added/deleted in the watched table(s). Each node sends back checkpointing information to the aggregator node which updates the high watermark timestamp. When tuning CDC for durability in the face of disaster, one should consider options like RequiredAcks (Kafka only), gc.ttl, protect_on_pause and pause_on_error, along with monitoring and alerting setup. For scaling and performance, batching, resolved timestamps, memory budget, message format, compression, scan request parallelism, time-based iterator, experimental poll interval, and number of changefeeds can be considered. It's crucial to understand the costs and benefits of various priorities for your application while fine-tuning CDC for an exact fit with your application’s needs to keep your changefeed pipeline healthy and performant.

Company
Cockroach Labs

Date published
Nov. 23, 2021

Author(s)
Abbey Russell

Word count
1983

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.