The NeurIPS 2024 Preshow: More Than Meets the Eye: How Transformations Reveal the Hidden Biases Shaping Our Datasets

Company

Voxel51

Date Published

Dec. 6, 2024

Author

Harpreet Sahota

Word count

2349

Language

English

Hacker News points

None

URL

voxel51.com/blog/the-neurips-2024-preshow-more-than-meets-the-eye-how-transformations-reveal-the-hidden-biases-shaping-our-datasets

Summary

This paper explores bias in large-scale visual datasets, specifically in YFCC, CC, and DataComp. Researchers used a novel framework to analyze various transformations that isolate different types of visual attributes, such as semantic, structural, color, and frequency biases. They discovered that semantic bias plays a significant role in distinguishing the datasets, with distinct thematic focuses and object distributions contributing to this bias. Structural bias is also present, with object shapes and spatial configurations being strong indicators of dataset origin. Color bias exists across both high-frequency and low-frequency components, while frequency bias contributes to the visual distinctiveness of the datasets. The findings suggest that despite efforts to improve diversity, large-scale datasets still exhibit significant biases that can affect model generalizability and robustness. By applying transformations and analyzing their outputs, researchers and practitioners can gain insights into their single dataset's visual characteristics and potential biases.