/plushcap/analysis/whylabs/whylabs-posts-re-imagine-data-monitoring-with-whylogs-and-apache-spark

Re-imagine Data Monitoring with whylogs and Apache Spark

What's this blog post about?

whylogs is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. It integrates with Apache Spark to achieve large scale data profiling and can be applied into existing data and ML pipelines. The integration is highly efficient, as it requires only a single pass of data and does not cause any shuffling. whylogs also supports both batch and streaming data sets, making it suitable for various deployment infrastructures. It provides a simple Spark API that can be used to extend the data set API and run various metadata and aggregation operations. The library is open source and has been designed with privacy, security, and compliance aspects of modern ML business requirements in mind.

Company
WhyLabs

Date published
Nov. 23, 2022

Author(s)
Andy Dang

Word count
2091

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.