Red Teaming Synthetic Data Models
This text discusses a practical attack on synthetic data models to validate their ability to protect sensitive information under different parameter settings. The authors use a credit card transaction fraud detection dataset from Kaggle and select four informative yet sensitive features for the experiment. They implement an attack by measuring the model's memory when generating secret values, or "canaries." Four neural network and privacy settings are used on the synthetic model to generate the dataset, with each model's performance evaluated on overall prediction accuracy and given a synthetic quality score (SQS). The authors conclude that models with various parameters and privacy settings can be used depending on the user's priorities in protecting sensitive values and achieving desirable performance.
Company
Gretel.ai
Date published
June 2, 2022
Author(s)
Marjan Emadi
Word count
1252
Language
English
Hacker News points
5