/plushcap/analysis/gretel-ai/gretel-ai-quantifying-pii-exposure-in-synthetic-data

Quantifying PII Exposure in Synthetic Data

What's this blog post about?

Gretel's PII Replay is a new privacy metric that identifies instances of sensitive information found in original training data and counts how often those values appear in synthetic output. This tool works alongside Membership Inference Protection and Attribute Inference Protection, ensuring your synthetic data remains private by design. By leveraging Gretel Transform to identify and classify instances of PII in the original training data, users can now easily see whether any of the original PII is showing up in their synthetic data. Strategies to minimize PII Replay include using Transform before generating synthetics, choosing a model designed to minimize PII replay, leveraging differential privacy, pre-processing to remove unnecessary columns, and using pre- and post-processing strategies strategically.

Company
Gretel.ai

Date published
Nov. 22, 2024

Author(s)
Alexa Haushalter

Word count
2382

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.