The Hallucination Index benchmark evaluates the performance of popular Large Language Models (LLMs) in generating correct and contextually relevant text, with a focus on detecting model hallucinations. The index is designed to help teams select the right LLM for their project and use case by providing a framework to address the variability and nuance that comes with generative AI. It uses seven rigorous benchmarking datasets to evaluate each LLM's performance across three task types: Question & Answer without Retrieval (RAG), Question & Answer with RAG, and Long-form Text Generation. The index ranks LLMs by task type, providing insights into the strengths and weaknesses of each model in addressing hallucinations. By utilizing a combination of quantitative metrics, such as Correctness and Context Adherence, and human evaluations, the Hallucination Index offers a comprehensive evaluation metric for detecting hallucinations in generative AI applications.