Company
Date Published
Author
Nathan Smith
Word count
3843
Language
English
Hacker News points
None

Summary

Neo4j Graph Data Science (GDS) was used to extract topics from documents in a vector store, allowing for semantic search capabilities. The GDS toolset enabled the creation of a knowledge graph representing documents and related topics. The graph's vector search capability facilitated searches over vector representations of topics and documents. By merging duplicated or closely related themes, the algorithm improved the efficiency of semantic searches. The use of stem words to identify common root words helped in identifying synonyms, while other techniques like Leiden community detection were used to group similar themes together. The long summary theme group index outperformed other indexing strategies, finding 27% more relevant movies than the movie index. The technique provided a structured approach to topic modeling and knowledge graph creation, allowing for better semantic search capabilities in RAG applications.