Company
Date Published
Author
Zach Blumenfeld
Word count
3901
Language
English
Hacker News points
None

Summary

The text discusses Supervised Entity Resolution (ER) in Neo4j, a graph database. ER is the process of disambiguating data to determine if multiple records represent the same real-world entity. This is important for various industries such as online advertising, marketing, and law enforcement. The article explores how graphs can be used to represent associated information between subjects with paths made up of nodes and relationships. It also discusses the importance of feature engineering, data sampling, and hyper-parameter configuration in developing a supervised machine learning pipeline for entity linking. The authors demonstrate how to create a pipeline using Neo4j's Graph Data Science (GDS) library, which includes steps such as creating a graph projection, generating node embeddings, configuring the link prediction pipeline, training the model, making new entity linkage predictions, and querying resolved person information. The article concludes by highlighting the potential of ER in various industries and encouraging readers to experiment with the GDS library for their own projects.