Company
Date Published
April 16, 2024
Author
Forrest Bao & Miaoran Li & Rogger Luo
Word count
1801
Language
English
Hacker News points
None

Summary

The Hughes Hallucination Evaluation Model (HHEM) v2 is a significant upgrade from its predecessor, offering improved performance in detecting factual consistency in Large Language Models (LLMs). HHEM v2 features multilinguality, unlimited context window, and calibration, making it more practical for Retrieval-Augmented Generation (RAG) applications. The model has been thoroughly tested against two of the latest hallucination benchmarks, AggreFact and RAGTruth, demonstrating superior performance to GPT-based LLM judges while maintaining low latency. HHEM v2 is calibrated to provide a probabilistic score, translating raw scores into meaningful probabilities. Its performance is particularly notable in detecting extrinsic hallucinations, which occur when unrelated pieces of information are stitched together. The model's calibration ensures that the scores provided by Vectara are aligned with detection probabilities on authoritative data. With its improved features and superior performance, HHEM v2 powers Vectara's Factual Consistency Score, providing a reliable tool for developing trustworthy RAG-based solutions.