Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data.
The text discusses a method to generate synthetic datasets using machine learning techniques while maintaining individual user privacy. It explains how public datasets can be used for various purposes, and how privacy regulations like GDPR and CCPA protect individual data. Techniques such as data anonymization, generalization, and perturbation are mentioned as ways to maintain statistical insights of data while reducing the risk of revealing personal information. The text then explores the possibility of using a combination of machine learning and differential privacy to enhance user privacy further. It proposes training a generative neural network on a public ride-sharing dataset (GBFS) to create synthetic datasets with enhanced protection for individual privacy. The privacy challenge is also discussed, along with how re-identification attacks can compromise individual privacy. The text concludes by suggesting the use of machine learning to create an artificial, synthetic dataset that contains statistical insights and applications similar to the original dataset but with improved user privacy.
Company
Gretel.ai
Date published
Sept. 14, 2020
Author(s)
Alex Watson
Word count
2683
Hacker News points
1
Language
English