GraphRAG Explained: Enhancing RAG with Knowledge Graphs

Post Details

Company

Zilliz

Date Published

Aug. 2, 2024

Author

Cheney Zhang

Word Count

3,308

Language

English

Hacker News Points

-

Source URL

zilliz.com/blog/graphrag-explained-enhance-rag-with-knowledge-graphs

Summary

Retrieval Augmented Generation (RAG) is a technique that connects external data sources to enhance the output of large language models (LLMs). This technique is perfect for LLMs to access private or domain-specific data and address hallucination issues. Therefore, RAG has been widely used to power many GenAI applications, such as AI chatbots and recommendation systems. Microsoft Research introduced GraphRAG, a brand-new method that augments RAG retrieval and generation with knowledge graphs. Unlike a baseline RAG that uses a vector database to retrieve semantically similar text, GraphRAG enhances RAG by incorporating knowledge graphs (KGs). Knowledge graphs are data structures that store and link related or unrelated data based on their relationships. A GraphRAG pipeline usually consists of two fundamental processes: indexing and querying. The GraphRAG Pipeline includes four key steps in the indexing process: Text Unit Segmentation, Entity, Relationship, and Claims Extraction, Hierarchical Clustering, and Community Summary Generation. In the querying stage, GraphRAG has two different querying workflows tailored for different queries: Global Search and Local Search. Baseline RAG vs. GraphRAG in Output Quality demonstrates that GraphRAG significantly improves multi-hop reasoning and complex information summarization. The research indicates that GraphRAG surpasses Baseline RAG in both comprehensiveness and diversity.