/plushcap/analysis/whylabs/whylabs-posts-how-to-validate-data-quality-for-ml-monitoring

How to Validate Data Quality for ML Monitoring

What's this blog post about?

Validating data quality is crucial for machine learning applications as poor data quality can lead to pipeline and model failure. This post explores why validating data quality is essential in the MLOps process and how to use the open-source whylogs library to perform data quality monitoring in a Python environment. The techniques discussed can be applied to any application that uses a data pipeline. Data quality validation ensures data is structured and falls within the expected range for pipelines or applications, preventing unwanted machine learning behavior in production. Using whylogs, an open-source data logging library, users can create lightweight profiles containing statistical summaries of data for measuring data quality, drift, and model drift in any Python environment. The feature in whylogs for performing data validation is called constraints, which can be set on default profile metrics or user-defined custom metrics.

Company
WhyLabs

Date published
July 27, 2022

Author(s)
Sage Elliott

Word count
1832

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.