Scaling GraphQL Schema Usage to billions of requests per day

Company

Wundergraph

Date Published

Feb. 4, 2025

Author

Dustin Deus

Word count

2631

Language

English

Hacker News points

URL

wundergraph.com/blog/scaling_graphql_observability

Summary

GraphQL schema usage data is crucial for understanding how a GraphQL API is being utilized, especially in federated architectures where multiple services contribute to the overall schema. To address the challenges of managing GraphQL schema usage at scale, we need to collect and process this data efficiently. We've implemented a robust system that handles high data volume, throughput, and latency by batching, queuing, and regional deployment. The system uses Kafka as a buffer to absorb client-side traffic spikes while being able to ingest data into ClickHouse at a constant rate in large batches. This setup allows us to accept data at a much higher rate than ClickHouse can ingest while consuming it at a rate that ClickHouse can handle. We've also implemented real-time ETL streaming pipeline using ClickPipes, which eliminates the need for implementing and maintaining a custom ETL pipeline from scratch. The system is highly available and can handle billions of requests per day, with features like horizontal scaling, high availability, regional deployment, and global load balancing. Observability and monitoring are also crucial, with instrumented metrics and alerting rules to notify us of any anomalies or issues.