YouTube Transcripts Into Knowledge Graphs for RAG Applications

Company

Neo4j

Date Published

Jan. 22, 2024

Author

Alex Gilmore

Word count

1770

Language

English

Hacker News points

None

URL

neo4j.com/blog/developer/youtube-transcripts-knowledge-graphs-rag

Summary

This blog post explores how to scrape YouTube video transcripts into a knowledge graph for Retrieval Augmented Generation (RAG) applications. The project uses Google Cloud Platform, Neo4j, and LangChain to create a document from the transcript, store the resulting documents in a Neo4j graph database, and embed only the smaller child chunks of the text using SpaCy embeddings. The process involves setting up services such as Google Cloud Storage and Neo4j AuraDB instance, scraping transcripts from YouTube videos, chunking the transcripts into manageable pieces, loading the transcripts into the Neo4j graph database, and creating an index on the embedding property for vector search. The project demonstrates how to build a simple knowledge graph that can be used for RAG applications, with plans to explore building a basic RAG application in the next blog post.