RAG Evaluation Using Ragas
Retrieval Augmented Generation (RAG) is an approach to building AI-powered chatbots that answer questions based on data the model has been trained on. However, natural language retrieval accuracy remains low, necessitating experiments to tune RAG parameters before deployment. Large Language Models (LLMs) are increasingly being used as judges for modern RAG evaluation, automating and speeding up evaluation while offering scalability and saving time and cost spent on manual human labeling. Two primary flavors of LLM-as-judge for RAG evaluation include MT-Bench and Ragas, with the latter emphasizing automation and scalability for RAG evaluations. Key data points needed for Ragas evaluation include the question, contexts, answer, and ground truth answer.
Company
Zilliz
Date published
March 18, 2024
Author(s)
Christy Bergman
Word count
1018
Language
English
Hacker News points
2