/plushcap/analysis/datadog/engineering-not-just-another-network-latency-issue

Not Just Another Network Latency Issue: How We Unraveled a Series of Hidden Bottlenecks

What's this blog post about?

An engineer team faced a high-urgency issue with an application in their usage estimation service, which led to excessive backlog and alert fatigue due to frequent paging about startup latency. The problem was resolved after addressing four separate issues including misconfiguration in the network proxy and a Linux kernel bug. They followed system-level metrics and inspected each component in the network path to investigate and resolve these issues in production. This process helped them improve observability of their network, leading to significant cost savings and better visibility for future troubleshooting.

Company
Datadog

Date published
May 26, 2023

Author(s)
Anatole Beuzon, Bowen Chen

Word count
1584

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.