[AARR] To Believe or Not to Believe Your LLM
Researchers have developed a method to determine when an AI large language model's response comes with uncertainty. They distinguish between two categories of uncertainty: epistemic (lack of knowledge) and aleatoric (irreducible randomness). By employing an information-theoretic metric, they can consistently identify occurrences where epistemic uncertainty is elevated, suggesting that the model's output may be unreliable or even a hallucination. The main idea is to capitalize on the diverse behavioral patterns observed when an LLM is presented with repeated potential responses. An information-theoretic metric measures epistemic uncertainty by assessing the sensitivity of the model's output distribution to the iterative addition of previous (potentially incorrect) responses to the stimulus. The paper introduces a hallucination detection algorithm based on scores, determining a "pseudo joint distribution" over multiple responses and using mutual information as a score that denotes the degree of conviction that the LLM hallucinates for the specified query. Experiments show that MI-based method exhibits comparable performance to semantic-entropy baseline on predominantly single-label datasets and significantly outperforms simpler metrics such as probability of greedy response and self-verification methods.
Company
Align AI
Date published
June 25, 2024
Author(s)
Align AI R&D Team
Word count
865
Language
English
Hacker News points
None found.