Locally running RAG pipeline with Verba and Llama3 with Ollama

Company

Weaviate

Date Published

July 9, 2024

Author

Adam Chan

Word count

1974

Language

English

Hacker News points

None

URL

weaviate.io/blog/local-llm-with-verba-for-RAG

Summary

Retrieval Augmented Generation (RAG) is a design pattern that developers are implementing into their applications to leverage large language models with contextual data not part of the model's training dataset. Weaviate has built Verba, an example of RAG implementation, showcasing various chunking and retrieval techniques for building Retrieval Augmented Generation applications. This blog post explores how to run Verba locally with Weaviate Cloud or a local instance of Weaviate, and then connect it all up to Ollama for local inference. It also discusses alternative deployment methods for Weaviate, such as using Weaviate Cloud, Weaviate Enterprise Cloud, Dockerized Weaviate, and Kubernetes environment with Helm charts.