We're launching the ecosystem's best support for evaluating multi-call chains, allowing users to evaluate multi-stage workflows with many calls to LLMs and functions, both end-to-end and across any stage of the chain. This feature is fully LangSmith SDK compatible, making it easy to get started. Additionally, we've added test case tagging, JSON schema validation evaluators, comparison diff view, and more UX improvements, including tracing, custom evaluator creation flow, and visual enhancements.