Data Quality Monitoring for Kafka, Beyond Schema Validation
Data quality issues can be challenging for applications dealing with large amounts of data. Schema validation is a good start but doesn't cover all aspects of data quality. Monitoring distribution shifts, unique value ratios, and data type counts in production can help detect issues that result in "weird data." Tools like whylogs can be used to set up data quality monitoring on Kafka streams, offering lightweight statistical representations of data called profiles. These profiles can be compared, visualized, and monitored for changes, helping identify potential data quality issues early on.
Company
WhyLabs
Date published
Aug. 23, 2022
Author(s)
Anthony Naddeo
Word count
1824
Hacker News points
None found.
Language
English