GraphRAG Explained: Enhancing RAG with Knowledge Graphs
Retrieval Augmented Generation (RAG) is a technique that connects external data sources to enhance the output of large language models (LLMs). This technique is perfect for LLMs to access private or domain-specific data and address hallucination issues. Therefore, RAG has been widely used to power many GenAI applications, such as AI chatbots and recommendation systems. Microsoft Research introduced GraphRAG, a brand-new method that augments RAG retrieval and generation with knowledge graphs. Unlike a baseline RAG that uses a vector database to retrieve semantically similar text, GraphRAG enhances RAG by incorporating knowledge graphs (KGs). Knowledge graphs are data structures that store and link related or unrelated data based on their relationships. A GraphRAG pipeline usually consists of two fundamental processes: indexing and querying. The GraphRAG Pipeline includes four key steps in the indexing process: Text Unit Segmentation, Entity, Relationship, and Claims Extraction, Hierarchical Clustering, and Community Summary Generation. In the querying stage, GraphRAG has two different querying workflows tailored for different queries: Global Search and Local Search. Baseline RAG vs. GraphRAG in Output Quality demonstrates that GraphRAG significantly improves multi-hop reasoning and complex information summarization. The research indicates that GraphRAG surpasses Baseline RAG in both comprehensiveness and diversity.
Company
Zilliz
Date published
Aug. 2, 2024
Author(s)
Cheney Zhang
Word count
3308
Language
English
Hacker News points
None found.