HHEM 2.1: A Better Hallucination Detection Model and a New Leaderboard

Company

Vectara

Date Published

Aug. 5, 2024

Author

Ofer Mendelevitch & Forrest Bao & Miaoran Li & Rogger Luo

Word count

1634

Language

English

Hacker News points

URL

vectara.com/blog/hhem-2-1-a-better-hallucination-detection-model

Summary

HHEM-2.1 is an improved version of the previous model HHEM-2.0, which outperforms both GPT-3.5-Turbo and GPT-4 for hallucination detection in three languages: English, French, and German. The new model has been integrated into Vectara's RAG-as-a-service platform and is automatically included with every call to the Query API, making it easy for enterprise developers to build trusted GenAI applications. HHEM-2.1 also offers a more accurate hallucination detection performance compared to its predecessors, with a better recall and precision in identifying hallucinations where they occur. The model has been benchmarked against other popular LLMs, including GPT-3.5-Turbo and GPT-4, and outperforms them in terms of F1 score, precision, and recall. Additionally, HHEM-2.1 is now available as an open-source model on Hugging Face and Kaggle, offering developers a more accessible option for building trusted GenAI applications. The new model also powers a revamped HHEM leaderboard that ranks LLMs based on their likelihood to hallucinate, providing a more accurate reflection of the true hallucination rate of LLMs.