Large language models (LLMs) are being used to answer questions about corporate data by connecting them to specialized data through retrieval-augmented generation (RAG). RAG works by integrating a retrieval component into the generative process, allowing LLMs to access private or corporate data and provide more accurate responses. However, traditional RAG approaches focus exclusively on text, leaving out information-rich images or charts contained in slide decks or reports. A new multimodal RAG template is being introduced, which allows models to process and reason across both text and images, paving the way for more comprehensive and nuanced AI apps. This template uses Redis and OpenAI's combined text and vision model, GPT4-V, to index documents and summaries efficiently and reduce redundant work by storing responses from previously answered questions in a semantic cache. The new template enables devs to build sophisticated AI apps that understand and leverage diverse data types powered by a single backend technology—Redis.