Building a robust RAG (Retrieval-Augmented Generation) evaluation pipeline with synthetic data is crucial for deploying such systems to production. A critical challenge emerges when teams deploy RAG systems, as they need to know how their system will handle diverse queries in the wild. To address this, we'll walk through building an end-to-end evaluation pipeline using synthetic data generation with Gretel's Data Designer. This approach allows us to systematically test different aspects of our RAG system and identify performance gaps and trade-offs. By generating diverse, comprehensive test sets and evaluating our system across multiple configurations, we can ensure that our RAG system handles the unexpected queries that inevitably arise in production. The evaluation pipeline consists of four main components: data ingestion and processing, setting up the vector store, synthetic data generation with Gretel, and evaluation and visualization. By using this approach, teams can save weeks of manual work while improving test coverage and ensuring their RAG systems are robust and reliable.