Re-imagine Data Monitoring with whylogs and Apache Spark
whylogs is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. It integrates with Apache Spark to achieve large scale data profiling and can be applied into existing data and ML pipelines. The integration is highly efficient, as it requires only a single pass of data and does not cause any shuffling. whylogs also supports both batch and streaming data sets, making it suitable for various deployment infrastructures. It provides a simple Spark API that can be used to extend the data set API and run various metadata and aggregation operations. The library is open source and has been designed with privacy, security, and compliance aspects of modern ML business requirements in mind.
Company
WhyLabs
Date published
Nov. 23, 2022
Author(s)
Andy Dang
Word count
2091
Language
English
Hacker News points
None found.