Company
Date Published
Author
Ornella Altunyan, Matt Granmoe
Word count
1040
Language
English
Hacker News points
None

Summary

The Loom team developed a robust method for evaluating the quality of their auto-generated video titles using generative AI. They started by identifying key traits of great video titles and checking for common measures of quality across various use cases. Next, they implemented objective measures with code-based scorers to automate these quality checks, removing variability from LLM responses. The team then created initial scorers and iterated on them by feeding in test examples, refining as needed. Through a cycle of defining criteria, implementing scoring functions, evaluating results, and refining, the Loom team established a repeatable system for shipping features faster and more confidently using Braintrust evals.