[AARR] To Believe or Not to Believe Your LLM

Company

Align AI

Date Published

June 25, 2024

Author

Align AI R&D Team

Word count

865

Language

English

Hacker News points

None

URL

tryalign.ai/resources/blog/aarr-to-believe-or-not-to-believe-your-llm

Summary

Researchers have developed a method to determine when an AI large language model's response comes with uncertainty. They distinguish between two categories of uncertainty: epistemic (lack of knowledge) and aleatoric (irreducible randomness). By employing an information-theoretic metric, they can consistently identify occurrences where epistemic uncertainty is elevated, suggesting that the model's output may be unreliable or even a hallucination. The main idea is to capitalize on the diverse behavioral patterns observed when an LLM is presented with repeated potential responses. An information-theoretic metric measures epistemic uncertainty by assessing the sensitivity of the model's output distribution to the iterative addition of previous (potentially incorrect) responses to the stimulus. The paper introduces a hallucination detection algorithm based on scores, determining a "pseudo joint distribution" over multiple responses and using mutual information as a score that denotes the degree of conviction that the LLM hallucinates for the specified query. Experiments show that MI-based method exhibits comparable performance to semantic-entropy baseline on predominantly single-label datasets and significantly outperforms simpler metrics such as probability of greedy response and self-verification methods.