Introducing a new evaluation for creative ability in Large Language Models
HumE-1 (Human Evaluation 1) is a new evaluation method for large language models (LLMs) that focuses on human ratings to assess their ability to perform creative tasks in ways that matter to us, evoking the desired feelings. LLMs are already being used in various fields such as writing books and articles, assisting legal professionals and healthcare practitioners, and providing mental health support. However, existing benchmarks fail to capture how these models affect our satisfaction and well-being. HumE-1 evaluates LLMs on tasks like writing motivational quotes, interesting facts, funny jokes, beautiful haikus, charming limericks, scary horror stories, appetizing descriptions of food, and persuasive arguments for charity donations. The evaluation uses honest and naturalistic prompts to reflect real-life scenarios better. In the first round of results, Gemini Ultra performed best, followed by GPT-4 Turbo, with both models having significant room for improvement.
Company
Hume
Date published
Feb. 9, 2024
Author(s)
Jeffrey Brooks, PhD
Word count
1062
Language
English
Hacker News points
None found.