It's always DNS . . . except when it's not: A deep dive through gRPC, Kubernetes, and AWS networking
The text describes a series of network issues that occurred when updates were made to a critical service. Initially, DNS errors were suspected as the cause, but further investigation revealed more complex problems involving dropped packets, connection tracking, and gRPC client reconnect algorithms. Through extensive analysis and testing, the team discovered that the root cause was an aggressive gRPC reconnect parameter that led to a SYN flood during rollouts. By addressing this issue, they were able to resolve the incident and gain valuable insights into their network infrastructure.
Company
Datadog
Date published
April 13, 2022
Author(s)
Laurent Bernaille, David Lentz
Word count
3700
Hacker News points
7
Language
English