HHEM v2: A New and Improved Factual Consistency Scoring Model

Company

Vectara

Date Published

April 16, 2024

Author

Forrest Bao & Miaoran Li & Rogger Luo

Word count

1801

Language

English

Hacker News points

None

URL

vectara.com/blog/hhem-v2-a-new-and-improved-factual-consistency-scoring-model

Summary

The Hughes Hallucination Evaluation Model (HHEM) v2 is a significant upgrade from its predecessor, offering improved performance in detecting factual consistency in Large Language Models (LLMs). HHEM v2 features multilinguality, unlimited context window, and calibration, making it more practical for Retrieval-Augmented Generation (RAG) applications. The model has been thoroughly tested against two of the latest hallucination benchmarks, AggreFact and RAGTruth, demonstrating superior performance to GPT-based LLM judges while maintaining low latency. HHEM v2 is calibrated to provide a probabilistic score, translating raw scores into meaningful probabilities. Its performance is particularly notable in detecting extrinsic hallucinations, which occur when unrelated pieces of information are stitched together. The model's calibration ensures that the scores provided by Vectara are aligned with detection probabilities on authoritative data. With its improved features and superior performance, HHEM v2 powers Vectara's Factual Consistency Score, providing a reliable tool for developing trustworthy RAG-based solutions.