/plushcap/analysis/align-ai/align-ai-aarr-to-believe-or-not-to-believe-your-llm

[AARR] To Believe or Not to Believe Your LLM

What's this blog post about?

Researchers have developed a method to determine when an AI large language model's response comes with uncertainty. They distinguish between two categories of uncertainty: epistemic (lack of knowledge) and aleatoric (irreducible randomness). By employing an information-theoretic metric, they can consistently identify occurrences where epistemic uncertainty is elevated, suggesting that the model's output may be unreliable or even a hallucination. The main idea is to capitalize on the diverse behavioral patterns observed when an LLM is presented with repeated potential responses. An information-theoretic metric measures epistemic uncertainty by assessing the sensitivity of the model's output distribution to the iterative addition of previous (potentially incorrect) responses to the stimulus. The paper introduces a hallucination detection algorithm based on scores, determining a "pseudo joint distribution" over multiple responses and using mutual information as a score that denotes the degree of conviction that the LLM hallucinates for the specified query. Experiments show that MI-based method exhibits comparable performance to semantic-entropy baseline on predominantly single-label datasets and significantly outperforms simpler metrics such as probability of greedy response and self-verification methods.

Company
Align AI

Date published
June 25, 2024

Author(s)
Align AI R&D Team

Word count
865

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.