/plushcap/analysis/tigergraph/tigergraph-minhash-based-fuzzy-match-on-graph

MinHash Based Fuzzy Match on Graph

What's this blog post about?

Fuzzy matching is a method used to find similar but not exactly matching phrases in a database. In graph data structures, performing string fuzzy matches can be challenging due to the need for indirect connections between vertices with similar string attribute values. The MinHash approach during the graph loading process can help by creating vertices using the MinHash signatures as their IDs and connecting them through common intermediary vertices. This method maintains performance, stores data natively in the graph database, and allows search through graph traversal. TigerGraph is a massively parallel processing graph analytical platform that supports implementing MinHash fuzzy match by defining schema, loading jobs, and executing queries using Jaccard Similarity or string distance functions.

Company
TigerGraph

Date published
Sept. 6, 2022

Author(s)
Xinyu Chang

Word count
1591

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.