/plushcap/analysis/voxel51/voxel51-the-neurlps-2024-preshow-a-label-is-worth-a-thousand-images-in-dataset-distillation

The NeurIPS 2024 Preshow: A Label is Worth a Thousand Images in Dataset Distillation

What's this blog post about?

Researchers at Harvard University have challenged the conventional wisdom in dataset distillation by suggesting that informative probabilistic labels, or soft labels, are more effective than generating synthetic images. The research paper "A Label is Worth a Thousand Images in Dataset Distillation" was accepted at NeurIPS 2024 and explores the importance of soft labels in dataset distillation methods. Soft labels contain structured information about relationships between classes, capture semantic similarities, and act as regularizers during training. The research also proposes a knowledge scaling law that suggests the optimal model for generating soft labels varies based on available data resources. Future directions include exploring smarter ways to generate soft labels, investigating dataset distillation without relying on expert knowledge, and extending soft labels to other tasks such as object detection and natural language processing.

Company
Voxel51

Date published
Dec. 4, 2024

Author(s)
Harpreet Sahota

Word count
925

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.