Company
Date Published
Feb. 23, 2023
Author
Bernease Herman
Word count
1375
Language
English
Hacker News points
None

Summary

Production data systems often rely on structured tabular data. However, for more complex data types such as images, text, and audio, embeddings are used to manipulate and aggregate multiple columns at the same time. Embeddings are heavily used in machine learning for various tasks like natural language understanding, computer vision, audio processing, and tabular machine learning. WhyLabs recently released features that make it easier to profile and monitor high dimensional embeddings data without requiring manual exploration of individual data points. The approach involves comparing each data point to several meaningful reference points within the embeddings vector space. This allows for better detection of issues in an organization's data, providing actionable insights on which types of data are most related to potential problems.