The NeurIPS 2024 Preshow: A Label is Worth a Thousand Images in Dataset Distillation
Researchers at Harvard University have challenged the conventional wisdom in dataset distillation by suggesting that informative probabilistic labels, or soft labels, are more effective than generating synthetic images. The research paper "A Label is Worth a Thousand Images in Dataset Distillation" was accepted at NeurIPS 2024 and explores the importance of soft labels in dataset distillation methods. Soft labels contain structured information about relationships between classes, capture semantic similarities, and act as regularizers during training. The research also proposes a knowledge scaling law that suggests the optimal model for generating soft labels varies based on available data resources. Future directions include exploring smarter ways to generate soft labels, investigating dataset distillation without relying on expert knowledge, and extending soft labels to other tasks such as object detection and natural language processing.
Company
Voxel51
Date published
Dec. 4, 2024
Author(s)
Harpreet Sahota
Word count
925
Language
English
Hacker News points
None found.