The paper challenges the trend of using synthetic data for training vision models, instead showing that retrieving targeted real images from a dataset consistently outperforms using synthetic images generated by a text-to-image model. This finding underscores the importance of evaluating the effectiveness of synthetic data against a robust baseline of curated real data. The study highlights the limitations of using synthetic data generated by current text-to-image models for fine-tuning pre-trained vision models, and suggests that further improvements in image generation are needed to surpass the effectiveness of training directly on relevant real-world data.