MMLU: Better Benchmarking for LLM Language Understanding
Massive Multitask Language Understanding (MMLU) is a challenging NLU benchmark developed by Hendrycks et al. to measure how well an LLM understands language and can solve problems with the knowledge it encountered during training. MMLU contains 15,908 questions from various subjects at varying depths, testing qualitative and quantitative analysis, knowledge about human behavior and society, empirical methods, fluid intelligence, and procedural knowledge. The benchmark is scored by averaging each model's performance per category and then averaging these four scores for a final score. MMLU has revealed intriguing insights into LLM performance across different subjects and continues to be a valuable tool in identifying specific areas where LLMs may need improvement.
Company
Deepgram
Date published
Aug. 22, 2023
Author(s)
Brad Nikkel
Word count
972
Hacker News points
None found.
Language
English