Company
Date Published
Oct. 31, 2024
Author
Maarten Van Segbroeck
Word count
991
Language
English
Hacker News points
None

Summary

Gretel has developed synthetic documents enriched with a wide variety of PII and PHI entities to improve entity detection without exposing real personal data. The gretelai/gretel-pii-masking-en-v1 dataset, created using Gretel Navigator, simulates real-world excerpts of documents filled with sensitive information across multiple industries and document types. By offering diverse scenarios, it pushes the boundaries of PII and PHI detection, giving developers confidence to fine-tune models while maintaining privacy compliance. The GLiNER models have been fine-tuned on this dataset, achieving significantly higher metrics compared to their base model counterparts. These models are ideal for applications in healthcare, finance, and more, ensuring accurate PII and PHI detection across diverse domains while complying with privacy regulations.