Red Teaming Synthetic Data Models

Company

Gretel.ai

Date Published

June 2, 2022

Author

Marjan Emadi

Word count

1252

Language

English

Hacker News points

URL

gretel.ai/blog/red-teaming-synthetic-data-models

Summary

This text discusses a practical attack on synthetic data models to validate their ability to protect sensitive information under different parameter settings. The authors use a credit card transaction fraud detection dataset from Kaggle and select four informative yet sensitive features for the experiment. They implement an attack by measuring the model's memory when generating secret values, or "canaries." Four neural network and privacy settings are used on the synthetic model to generate the dataset, with each model's performance evaluated on overall prediction accuracy and given a synthetic quality score (SQS). The authors conclude that models with various parameters and privacy settings can be used depending on the user's priorities in protecting sensitive values and achieving desirable performance.