Use Deep Memory to Boost RAG Apps' Accuracy by up to +22%
Retrieval Augmented Generation (RAG) systems, which provide context to Large Language Models (LLMs), are currently being used by enterprises for various applications such as documenting internal processes and customer support automation. However, the utility of these implementations depends on the retrieval accuracy, with RAG applications achieving a maximum accuracy rate of 70%. To improve this, several techniques like feature engineering, fine-tuning embeddings, hybrid or lexical search, reranking final results with cross encoders, and context-aware fine-tuning LLMs have been employed. However, these methods offer only marginal improvements that do not fundamentally change the user experience of using LLM apps. Deep Memory is a new solution that significantly increases Deep Lake's vector search accuracy up to +22% by learning an index from labeled queries tailored to your application without impacting search time. This can be achieved with just a few hundred example pairs of prompt embeddings and most relevant answers from the vector store. After training, vector search is used as usual. Deep Memory also allows for further improvement in search results by combining them with lexical search or rerankers. Deep Memory has been shown to improve retrieval accuracy without altering existing workflows and can significantly reduce inference costs via lower token usage. Health tech startup Munai, backed by the Bill & Melinda Gates Foundation, has achieved an 18.6% boost in vector search accuracy across medical documents using Deep Memory. The solution is now generally available with the latest Deep Lake version.
Company
Activeloop
Date published
Sept. 28, 2023
Author(s)
Davit Buniatyan
Word count
1294
Language
English
Hacker News points
None found.