Getting started with automated evaluations

Company

Braintrust

Date Published

April 24, 2024

Author

Albert Zhang

Word count

851

Language

English

Hacker News points

None

URL

www.braintrust.dev/blog/getting-started-evals

Summary

At Braintrust, automated evaluations are being used by AI teams to improve the development speed of their applications. Prior to this, teams relied on manual review and benchmarks, which fell short in scaling and application specificity. Automated evaluations offer a high-leverage way for teams to quickly understand product performance, identify regressions, and improve their dev loop. Three approaches to automated evaluations are discussed: LLM evaluators, heuristics, and comparative evals. These methods enable teams to set up basic structure around automated evaluations, unlocking the ability for developers to start iterating quickly and making human review time much higher leverage.