Confident AI's JudgementalGPT is an LLM agent built using OpenAI's Assistants API designed for evaluating other LLM applications, providing more accurate and reliable results compared to state-of-the-art approaches like G-Eval. However, the limitations of LLM-based evaluations include unreliability, inaccuracy, and bias, which can be addressed by having multiple evaluators that perform different evaluations depending on the evaluation task at hand. JudgementalGPT is a proxy for multiple assistants that account for tasks prone to logical fallacies and provide more guidance based on user feedback. Despite its advantages, problems with LLM-based evaluation still linger, including accuracy challenges stemming from single-digit scores and intricacies in defining evaluators. The key to building a better evaluator lies in tailoring them for specific use cases, leveraging OpenAI's Assistant API and code interpreter functionality.