RAG Without OpenAI: BentoML, OctoAI and Milvus
This tutorial demonstrates how to build retrieval augmented generation (RAG) applications using large language models (LLMs) without relying on OpenAI. The process involves serving embeddings with BentoML, inserting data into a vector database for RAG, setting up an LLM for RAG, and providing instructions to the LLM. Key components include BentoML for serving embeddings, OctoAI for accessing open-source models, and Milvus as the vector database. The example uses BentoML's Sentence Transformers Embeddings repository, a local Milvus instance using Docker Compose, and the Nous Hermes fine-tuned Mixtral model from OctoAI for RAG.
Company
Zilliz
Date published
April 23, 2024
Author(s)
By Yujian Tang
Word count
2820
Language
English
Hacker News points
None found.