The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

Company

Arize

Date Published

Nov. 14, 2023

Author

Sarah Welsh

Word count

6235

Language

English

Hacker News points

None

URL

arize.com/blog/the-geometry-of-truth-emergent-linear-structure-in-llm-representation-of-true-false-datasets

Summary

Language models linearly represent truth or falsehood in factual statements and have a unique structure that can be extracted using mass-mean probing, a novel technique that generalizes better than traditional probing methods. The paper presents evidence of this structure and shows how it can be used to improve the reliability of language models. The authors' goal is to develop a way for humans to access what AI systems know about truth and falsehood, which would enable more accurate evaluations of their outputs. The research has implications for the development of more reliable LLMs and addressing the scalable oversight problem as AI systems become more capable.