/plushcap/analysis/whylabs/whylabs-posts-data-logging-sampling-versus-profiling

Data Logging: Sampling versus Profiling

What's this blog post about?

The article discusses the importance of data logging for robust ML/AI applications. It compares two approaches to data logging - sampling and profiling. Sampling involves randomly or programmatically selecting samples of data from a larger data stream, while profiling collects statistical measurements of the data. The author argues that profiling is superior to sampling as it provides a lightweight, robust approach to characterizing distributions for all types of data encountered in ML. Profiling also captures rare events and outliers accurately, which are often correlated with data issues. The article presents whylogs - an open-source library developed by the team at WhyLabs that enables scalable, statistical data logging and profiling in only a few lines of code. It also highlights how profiles can be used for automated monitoring of ML/AI applications and pipelines due to their lightweight, controlled, simple, human-centered, and statistical nature.

Company
WhyLabs

Date published
Oct. 29, 2020

Author(s)
Bernease Herman

Word count
1433

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.