Hands-Free Kafka Replication: A Lesson in Operational Simplicity
The text discusses the challenges in tuning Apache Kafka's replication protocol, particularly for varying size workloads on a single cluster. It highlights how unexpected behavior and false alarms can lead to manual operational overhead and churn. The root cause of this issue is traced back to the way replica lags are measured, which often requires users to guess values based on expected traffic patterns. To address this problem, Kafka has introduced a new model for detecting out-of-sync replicas that eliminates the need for any guesswork and puts an upper bound on message commit latency. This change is set to be available in the next version of the Confluent Platform.
Company
Confluent
Date published
July 1, 2015
Author(s)
Lucia Cerchie, Neha Narkhede, Josep Prat
Word count
1813
Hacker News points
None found.
Language
English