Researchers from UC Berkeley have developed a new tool called SPADE (System for Prompt Analysis and Delta-based Evaluation) to help organizations evaluate large language models (LLMs) in automated pipelines or chains. The tool aims to automatically recommend evaluation functions based on prompt refinements, making it easier to monitor LLM responses and improve deployment reliability. SPADE is currently available as a prototype and the researchers are seeking feedback from users.