Innovating With FastText and Table Headers
FastText word embeddings can be used to quickly understand new datasets and build more consistent labels for structured data such as tables, JSON, or CSV files. The technique involves using a pre-trained FastText model based on schema examples from large collections of data. This approach helps in finding synonyms, abbreviations, and other variations of field headers, which can be useful when designing new table schemas or assessing the joinability of two tables. Additionally, it can aid in enforcing standardization policies across multiple internal data sources by comparing header suggestions with company standards.
Company
Gretel.ai
Date published
Aug. 20, 2020
Author(s)
Amy Steier
Word count
974
Language
English
Hacker News points
None found.