🔭 What is NER And Why It’s Hard to Get Right

Company

Galileo

Date Published

May 27, 2022

Author

Ben Epstein

Word count

944

Language

English

Hacker News points

None

URL

www.galileo.ai/blog/what-is-ner-and-why-it-s-hard-to-get-right

Summary

The Named Entity Recognition (NER) task is an important component of various Natural Language Processing (NLP) pipelines, particularly challenging to improve due to the nuance and complexity of annotating text data. NER involves identifying words or spans in a sample that belong to specific label categories, such as person or location. Unlike text classification, where each sentence is classified into one category, NER can have multiple labels for each span, creating an explosion of potential tasks. However, collecting high-quality NER data is time-consuming and often requires domain experts, leading to limitations in training models. As a result, NER systems are often combined with rule-based features or fine-tuned on custom data using pre-trained language models. The black-box nature of NER models makes introspection and generalization efforts difficult, but Galileo's data-centric approach aims to address these challenges by surfacing error patterns and providing granular insights. By reducing the complexity of data structure and tagging schemas, Galileo enables more efficient model iterations and improves the performance of NER systems.