Data Logging with whylogs
whylogs is an open source tool for data logging that enables users to detect data drift, prevent ML model performance degradation, and validate the quality of their data. The v1 release brings a simpler API, new data constraints, new profile visualizations, faster performance, and a usability refresh. With whylogs, users can generate statistical summaries (termed whylogs profiles) from data as it flows through their data pipelines and into their machine learning models. These profiles enable users to track changes in their data over time, detecting data drift or data quality problems. The tool supports both tabular and complex data and runs natively in Python and JVM environments. It also supports batch processing (e.g., Apache Spark) and streaming (e.g., Apache Kafka). whylogs v1 is built for scale and optimized for massive data sets, with a more than 500x improvement in the speed of generating profiles for large datasets compared to the previous version.
Company
WhyLabs
Date published
May 31, 2022
Author(s)
WhyLabs Admin
Word count
1659
Language
English
Hacker News points
1