Building a RAG (Retriever-Augmented Generator) pipeline involves complex engineering challenges and requires continuous expertise in LLMs, retrieval, specialized MLOps, and more. The RAG pipeline consists of two major flows: ingest flow for data extraction, chunking, encoding, and storage; and query flow for responding to user queries with encoding, retrieval, reranking, calling the generative LLM, and hallucination detection. Smaller models in RAG have emerged as specialized tools that can achieve superior performance compared to larger models. Vectara provides an end-to-end RAG platform that abstracts this complexity behind an easy-to-use API, allowing users to build their own RAG applications quickly and efficiently.