/plushcap/analysis/whylabs/whylabs-posts-data-quality-monitoring-for-kafka-beyond-schema-validation

Data Quality Monitoring for Kafka, Beyond Schema Validation

What's this blog post about?

Data quality issues can be challenging for applications dealing with large amounts of data. Schema validation is a good start but doesn't cover all aspects of data quality. Monitoring distribution shifts, unique value ratios, and data type counts in production can help detect issues that result in "weird data." Tools like whylogs can be used to set up data quality monitoring on Kafka streams, offering lightweight statistical representations of data called profiles. These profiles can be compared, visualized, and monitored for changes, helping identify potential data quality issues early on.

Company
WhyLabs

Date published
Aug. 23, 2022

Author(s)
Anthony Naddeo

Word count
1824

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.