Shift Left: Bad Data in Event Streams, Part 1
Bad data is any data that doesn't conform to what is expected, including malformed or corrupted data. In event streams, bad data can cause serious issues and outages for all downstream data users. The strategies for mitigating and fixing bad data in streams include prevention, event design, and rewind, rebuild, and retry. Prevention is the most effective strategy and involves using schemas, testing, and validation rules to prevent bad data from entering the system. Event design allows issuing corrections by overwriting previous bad data, while rewind, rebuild, and retry can be used when all else fails. Bad data can creep into data sets in various ways, but good data practices such as prevention are crucial for dealing with it effectively.
Company
Confluent
Date published
Oct. 4, 2024
Author(s)
-
Word count
4397
Language
English
Hacker News points
None found.