Building Knowledge Graphs at Production Scale for GenAI

Company

DataStax

Date Published

Oct. 16, 2024

Author

Word count

615

Language

English

Hacker News points

None

URL

www.datastax.com/blog/building-knowledge-graphs-at-production-scale-for-genai

Summary

Knowledge graphs are being utilized to enhance the results of retrieval-augmented generation (RAG) applications, with most examples demonstrating how to build a knowledge graph from a small number of documents. The typical approach involves extracting fine-grained, entity-centric information, which does not scale well due to time and cost constraints when dealing with large datasets. Content-centric knowledge graphs, such as GraphVectorStore, offer an easier and more efficient alternative by allowing links between chunks. This article presents a comparison of the two approaches using a subset of Wikipedia articles from the 2wikimultihop dataset. The content-centric approach is shown to be significantly faster and less expensive than the entity-centric method when loading large datasets, with parallelism further reducing processing time. Additionally, the content-centric approach produces more accurate and relevant answers to questions posed over the loaded data. Overall, GraphVectorStore offers a practical solution for building knowledge graphs at scale for RAG applications.