Company
Date Published
Oct. 17, 2024
Author
Albert Zhang, Ornella Altunyan
Word count
1041
Language
English
Hacker News points
None

Summary

You built an AI application, curated test examples, picked a scoring function, and ran an evaluation. Now you need to think about that score and what steps to take next to continuously improve both your AI application and evaluation process. To determine where to focus first, review 5-10 actual examples from your evaluations, inspecting the trace for each row, input, output, and scoring outcome. Analyze these examples to identify patterns around how your application is performing and whether your scoring is accurate. You can then decide whether to refine your evaluations or make changes to your application, iterating through both with rapid development loops to continuously improve performance.