Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data.

Company

Gretel.ai

Date Published

Sept. 14, 2020

Author

Alex Watson

Word count

2683

Language

English

Hacker News points

URL

gretel.ai/blog/using-generative-differentially-private-models-to-build-privacy-enhancing-synthetic-datasets-from-real-data

Summary

The text discusses a method to generate synthetic datasets using machine learning techniques while maintaining individual user privacy. It explains how public datasets can be used for various purposes, and how privacy regulations like GDPR and CCPA protect individual data. Techniques such as data anonymization, generalization, and perturbation are mentioned as ways to maintain statistical insights of data while reducing the risk of revealing personal information. The text then explores the possibility of using a combination of machine learning and differential privacy to enhance user privacy further. It proposes training a generative neural network on a public ride-sharing dataset (GBFS) to create synthetic datasets with enhanced protection for individual privacy. The privacy challenge is also discussed, along with how re-identification attacks can compromise individual privacy. The text concludes by suggesting the use of machine learning to create an artificial, synthetic dataset that contains statistical insights and applications similar to the original dataset but with improved user privacy.