Detecting Semantic Drift within Image Data: Monitoring Context-Full Data with whylogs
This article discusses the use of whylogs for monitoring machine learning systems' data ingestion pipeline by enabling concept drift detection, specifically for image data. It presents two scenarios to demonstrate how to create more generalized semantic metrics and monitor specialized semantic information directly from datasets. The first scenario involves using metadata information and properties such as hue, saturation, or brightness (HSB) to detect data changes. In the second scenario, semantic drifts in data are detected by generating feature embeddings using transfer learning. Custom features like distances from cluster centers can be created to represent the distance from the logged images to the "ideal" representation of each class. The article concludes with a discussion on how whylogs can help detect data drift issues in images and provides examples of how these approaches can be integrated into different stages of the data pipeline for full observability of machine learning applications.
Company
WhyLabs
Date published
Aug. 7, 2021
Author(s)
WhyLabs Admin
Word count
2726
Hacker News points
2
Language
English