Using Redis for real-time RAG goes beyond a Vector Database

Company

Redis

Date Published

June 13, 2024

Author

Yiftach Shoolman

Word count

1364

Language

English

Hacker News points

None

URL

redis.io/blog/using-redis-for-real-time-rag-goes-beyond-a-vector-database

Summary

Retrieval Augmented Generation (RAG) has become the standard architecture for GenAI applications requiring access to private data. The key challenge is maintaining fast application performance when incorporating AI. Paul Buchheit's "100ms Rule" suggests that every interaction should be faster than 100ms to feel instantaneous. A typical RAG-based architecture has an average end-to-end response time of 1,513ms, which is not ideal for user engagement. Redis offers three main datastore capabilities for AI: vector search, semantic caching, and LLM Memory. These features enable real-time RAG by significantly improving user experience end-to-end. By utilizing Redis' capabilities for AI, a GenAI application can achieve an average end-to-end response time of 389ms, which is around x3.2 faster than non-real-time RAG architectures and closer to the 100ms Rule.