Practical Privacy with Synthetic Data

Company

Gretel.ai

Date Published

April 27, 2021

Author

Alex Watson

Word count

1003

Language

English

Hacker News points

None

URL

gretel.ai/blog/practical-privacy-with-synthetic-data

Summary

This post discusses the implementation of a practical attack on synthetic data models to measure unintended memorization in neural network models, as described in "Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks" by Nicholas Carlini et al. The authors use this attack to evaluate how well synthetic data models with various neural network and differential privacy parameter settings protect sensitive data and secrets in datasets. They work with a smaller dataset containing sensitive location data, which is considered challenging to anonymize. The authors insert canary values into the model's training data and measure each model's propensity to memorize and replay these canary values. Results show that differential privacy works well at preventing memorization of secrets across all tested configurations, while gradient clipping also effectively prevented any replay of canary values with only a small loss in model accuracy.