Company
Date Published
Sept. 16, 2024
Author
Ankur Goyal
Word count
511
Language
English
Hacker News points
None

Summary

The Braintrust Playground now offers the ability to create custom scoring functions, allowing developers and non-technical builders to move faster, run more experiments, and build better AI products. This feature enables users to access custom scorers via the UI and API, unlocking new capabilities such as running sophisticated comparisons across multiple prompts and scoring functions, defined as LLM-as-a-judge, TypeScript, Python, or HTTP endpoints. The custom scorer functionality can be created using a combination of heuristics (best expressed as code) and LLM-as-a-judge (best expressed as a prompt), and can also utilize existing evaluators in autoevals as a starting point. Users can upload task and scorer functions to Braintrust from the command line, access custom scorers through the API, and run server-side online evaluations asynchronously on specific logs. This new feature unlocks several workflows, including iterating on prompts, monitoring performance over time, finding anomalous cases, and adding logs to a dataset for additional evaluations.