Quantifying PII Exposure in Synthetic Data
Gretel's PII Replay is a new privacy metric that identifies instances of sensitive information found in original training data and counts how often those values appear in synthetic output. This tool works alongside Membership Inference Protection and Attribute Inference Protection, ensuring your synthetic data remains private by design. By leveraging Gretel Transform to identify and classify instances of PII in the original training data, users can now easily see whether any of the original PII is showing up in their synthetic data. Strategies to minimize PII Replay include using Transform before generating synthetics, choosing a model designed to minimize PII replay, leveraging differential privacy, pre-processing to remove unnecessary columns, and using pre- and post-processing strategies strategically.
Company
Gretel.ai
Date published
Nov. 22, 2024
Author(s)
Alexa Haushalter
Word count
2382
Language
English
Hacker News points
None found.