Shift Left: Bad Data in Event Streams, Part 2
The text discusses various strategies for handling bad data in event streams, which are different from batch processing due to their immutable nature. The first technique is prevention, which involves using schemas, tests, and data quality constraints to ensure well-defined data from the start. This approach saves headaches and problems in batch processing. The next layer of defense is event design, particularly state events, which prevent bad data by designing events that produce well-defined data in the first place. State events also enable event-carried state transfer and allow consumers to infer deltas from previous events. Compaction, a process in Apache Kafka, can be used to delete older versions of data with the same record key. However, fixing delta-style events is more challenging due to their tight coupling with business logic and the inability to compact them. Two strategies for repairing bad delta events are building-forward techniques or rewinding, rebuilding, and retrying the topic. The latter approach requires significant intervention and can be expensive and complex. Effective event design and prevention remain crucial in dealing with bad data in event streams.
Company
Confluent
Date published
Oct. 11, 2024
Author(s)
Adam Bellemare
Word count
4417
Language
English
Hacker News points
None found.