GLiNER Models for PII Detection through Fine-Tuning on Gretel-Generated Synthetic Documents
Gretel has developed synthetic documents enriched with a wide variety of PII and PHI entities to improve entity detection without exposing real personal data. The gretelai/gretel-pii-masking-en-v1 dataset, created using Gretel Navigator, simulates real-world excerpts of documents filled with sensitive information across multiple industries and document types. By offering diverse scenarios, it pushes the boundaries of PII and PHI detection, giving developers confidence to fine-tune models while maintaining privacy compliance. The GLiNER models have been fine-tuned on this dataset, achieving significantly higher metrics compared to their base model counterparts. These models are ideal for applications in healthcare, finance, and more, ensuring accurate PII and PHI detection across diverse domains while complying with privacy regulations.
Company
Gretel.ai
Date published
Oct. 31, 2024
Author(s)
Maarten Van Segbroeck
Word count
991
Hacker News points
None found.
Language
English