WP Engine is a large web hosting platform that hosts over 1.5 million websites in over 150 countries, serving 175,000 customers and processing 5.2 billion requests per day. When their monitoring solution went down during an outage, they needed a scalable solution to handle the scale of their data needs on a global level. WP Engine built a custom observability platform using Telegraf for data collection, Kubernetes for aggregation, Kapacitor and InfluxDB OSS for alerting, Google Pub/Sub for data transfer, and Chronograf and Grafana for visualization. The new system ingests 5 million points every minute, twenty times more metrics than the old system, and stores that data in multiple locations, eliminating a single point of failure issue and reducing human cognitive overhead by half.