The NeurIPS 2024 Preshow: Data Quality Over Quantity: Why Real Images Still Reign Supreme for Vision Model Training

Company

Voxel51

Date Published

Dec. 6, 2024

Author

Harpreet Sahota

Word count

1220

Language

English

Hacker News points

None

URL

voxel51.com/blog/the-neurips-2024-preshow-data-quality-over-quantity-why-real-images-still-reign-supreme-for-vision-model-training

Summary

The paper challenges the trend of using synthetic data for training vision models, instead showing that retrieving targeted real images from a dataset consistently outperforms using synthetic images generated by a text-to-image model. This finding underscores the importance of evaluating the effectiveness of synthetic data against a robust baseline of curated real data. The study highlights the limitations of using synthetic data generated by current text-to-image models for fine-tuning pre-trained vision models, and suggests that further improvements in image generation are needed to surpass the effectiveness of training directly on relevant real-world data.