How we accidentally discovered personal data in a popular Kaggle dataset
The upcoming features in Gretel Public Beta include automatic data labeling using Natural Language Processing (NLP) and neural network-based entity recognition for names and addresses, managed regular expressions, and custom extractors. These features enable the discovery of personally identifiable information (PII) such as full names and email addresses in datasets like Lending Club's financial dataset on Kaggle. Gretel helps developers share data more safely by providing workflows to understand and make informed decisions about data safety.
Company
Gretel.ai
Date published
Aug. 24, 2020
Author(s)
John Myers
Word count
923
Language
English
Hacker News points
1