Evaluations are a crucial aspect of building production-grade AI products, consisting of data, task, and scores. To improve evaluations, one should work towards codifying their understanding of what constitutes a good response into scoring functions and gathering representative test examples. Three approaches to improving evaluations include identifying new and useful evaluators, improving existing scorers by adding context or precision, and adding new test cases to the dataset. By leveraging these methods, developers can establish a feedback loop that helps them understand the impact of changes made to their AI application, ultimately leading to better product development.