How To Create Differentially Private Synthetic Data
This post provides a practical guide to creating differentially private synthetic data using Python and TensorFlow. It demonstrates how to train a synthetic data model on the Netflix Prize dataset while protecting user identities through differential privacy techniques. The goal is to generate new data in the same format as the source data, with increased privacy guarantees and retaining statistical insights. The post discusses parameter tuning approaches for finding optimal privacy parameters and presents experiments using the gretel-synthetics library and TensorFlow-Privacy. It also explores optimizing learning rates, l2_norm_clip, and noise_multiplier to improve model accuracy while maintaining privacy guarantees. The final section encourages readers to experiment with generating synthetic datasets on their own data using the provided Jupyter notebook.
Company
Gretel.ai
Date published
Jan. 9, 2021
Author(s)
Alex Watson
Word count
1073
Hacker News points
1
Language
English