MinHash Based Fuzzy Match on Graph
Fuzzy matching is a method used to find similar but not exactly matching phrases in a database. In graph data structures, performing string fuzzy matches can be challenging due to the need for indirect connections between vertices with similar string attribute values. The MinHash approach during the graph loading process can help by creating vertices using the MinHash signatures as their IDs and connecting them through common intermediary vertices. This method maintains performance, stores data natively in the graph database, and allows search through graph traversal. TigerGraph is a massively parallel processing graph analytical platform that supports implementing MinHash fuzzy match by defining schema, loading jobs, and executing queries using Jaccard Similarity or string distance functions.
Company
TigerGraph
Date published
Sept. 6, 2022
Author(s)
Xinyu Chang
Word count
1591
Hacker News points
None found.
Language
English