Custom scoring functions in the Braintrust Playground

Company

Braintrust

Date Published

Sept. 16, 2024

Author

Ankur Goyal

Word count

511

Language

English

Hacker News points

None

URL

www.braintrust.dev/blog/custom-scorers

Summary

The Braintrust Playground now offers the ability to create custom scoring functions, allowing developers and non-technical builders to move faster, run more experiments, and build better AI products. This feature enables users to access custom scorers via the UI and API, unlocking new capabilities such as running sophisticated comparisons across multiple prompts and scoring functions, defined as LLM-as-a-judge, TypeScript, Python, or HTTP endpoints. The custom scorer functionality can be created using a combination of heuristics (best expressed as code) and LLM-as-a-judge (best expressed as a prompt), and can also utilize existing evaluators in autoevals as a starting point. Users can upload task and scorer functions to Braintrust from the command line, access custom scorers through the API, and run server-side online evaluations asynchronously on specific logs. This new feature unlocks several workflows, including iterating on prompts, monitoring performance over time, finding anomalous cases, and adding logs to a dataset for additional evaluations.