Building reliable AI apps is challenging due to the lack of best practices for testing and evaluation. The traditional software development approach, such as setting up CI/CD and writing tests, does not directly apply to AI app development. A good evaluation system is crucial in this process, but it's often unclear how to create one. This journey from prototype to production involves manual testing, getting friends to test, fixing bugs, adding features, and finally using an evaluation script to automate the testing and validation process. Braintrust provides libraries and a web UI to make evaluating AI apps easy, saving time and improving developer iteration speed. With this tool, teams can set up their evaluation workflow in under 10 minutes and focus on building fun parts of AI apps.