Lessons learned from running a large gRPC mesh at Datadog
Datadog uses gRPC, an open source RPC framework, to enable efficient communication between its distributed services. The company has learned several lessons from running a large mesh of gRPC services in a high-scale Kubernetes environment and identified best practices for using gRPC effectively. These include using Kubernetes headless services and gRPC round-robin to avoid load balance issues, leveraging TLS to handle IP recycling, setting MAX_CONNECTION_AGE to force gRPC to re-resolve from DNS, ensuring proper scale-out detection, setting the keepalive channel option to mitigate the effects of silent connection drops, and properly monitoring services.
Company
Datadog
Date published
April 22, 2024
Author(s)
Nicholas Thomson, Antoine Tollenaere
Word count
2732
Hacker News points
None found.
Language
English