Automatically Reducing AI Bias With Synthetic Data
This blog provides a step-by-step guide on using Gretel SDKs to create a fair, balanced, privacy preserving version of the 1994 US Census dataset. The process involves balancing underrepresented classes such as race, gender, and income bracket in the dataset. The Python notebook used for this purpose can be utilized for any imbalanced dataset. Gretel's SDK allows users to choose from two modes: "full" mode generates a complete synthetic dataset with representation bias removed, while "additive" mode only generates synthetic samples that remove bias when added to the original set. The blueprint also enables users to view existing categorical field distributions in the dataset and generate synthetic data for specific fields. Finally, users can save their new synthetic data or back onto a Gretel Project.
Company
Gretel.ai
Date published
Jan. 9, 2021
Author(s)
Amy Steier
Word count
679
Language
English
Hacker News points
1