Eval feedback loops

Company

Braintrust

Date Published

April 17, 2024

Author

Ankur Goyal

Word count

1002

Language

English

Hacker News points

None

URL

www.braintrust.dev/blog/eval-feedback-loops

Summary

In the field of AI engineering, establishing a set of real-world examples, known as "evals", is crucial to understand how changes will impact end users. However, finding great eval data, identifying interesting cases in production, and tying user feedback to evals can be challenging problems. To overcome these issues, connecting real-world log data to evals allows for the evaluation of new and interesting cases in the wild, enabling improvements and avoiding regression. This is achieved by structuring evals as a function of data, prompts/code, and scoring functions, and utilizing tools like Braintrust's Eval function that streamlines this process. By capturing and utilizing logs, teams can power their evals with real-world examples, making it easier to identify interesting cases and improve AI products. As teams scale, filtering logs to only consider the most interesting ones becomes critical, and using filters, tracking user feedback, or running online scores can help uncover test cases that need improvement. Braintrust's solution provides a unified UI for exploring logs and evals, automating code reuse, and storing datasets in a cloud environment.