Company
Date Published
Author
-
Word count
857
Language
English
Hacker News points
None

Summary

Evaluations are crucial for reliable LLM-powered applications or agents, but building evaluations from scratch can be challenging. Two new packages, openevals and agentevals, provide a set of evaluators and a common framework to help developers get started. These packages focus on releasing pre-built solutions that share common evaluation trends and best practices, making it easier for developers to create reliable evaluations. They cater to various use cases, including LLM-as-a-judge evaluations for natural language outputs, structured data evaluations for extracting information from documents, and agent evaluations for assessing trajectories of actions taken by agents. The packages also provide tools like LangSmith for tracking results over time and sharing them with a team.