/plushcap/analysis/datadog/datadog-grpc-at-datadog

Lessons learned from running a large gRPC mesh at Datadog

What's this blog post about?

Datadog uses gRPC, an open source RPC framework, to enable efficient communication between its distributed services. The company has learned several lessons from running a large mesh of gRPC services in a high-scale Kubernetes environment and identified best practices for using gRPC effectively. These include using Kubernetes headless services and gRPC round-robin to avoid load balance issues, leveraging TLS to handle IP recycling, setting MAX_CONNECTION_AGE to force gRPC to re-resolve from DNS, ensuring proper scale-out detection, setting the keepalive channel option to mitigate the effects of silent connection drops, and properly monitoring services.

Company
Datadog

Date published
April 22, 2024

Author(s)
Nicholas Thomson, Antoine Tollenaere

Word count
2732

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.