A deep dive into investigating a complex denial-of-service attack
On April 19, 2024, Datadog experienced a meticulously crafted denial-of-service (DoS) attack that targeted several of its regions. The incident unfolded over the next few days as the company worked to understand and mitigate the root cause. The attack exploited a vulnerability in Google Cloud Platform's Classic Application Load Balancer, which allowed malicious requests to be sent with a body and no "Content-Length" header. This tricked Envoy into thinking that the request had no body, leaving the payload unprocessed in the connection buffer. Datadog was able to detect the problem almost immediately using SLOs and monitors, which helped launch investigations quickly and quantify customer impact. The company engaged a wide array of teams to work on the problem together, emphasizing the need for effective incident management processes and tooling. To mitigate the issue, Datadog gradually shifted more traffic to an L4 load balancer and worked with GCP to block the malicious traffic using Google Cloud Armor. The company has since implemented critical rules in its security management system and uses Application Security Management (ASM) to detect similar malicious requests in the future. This incident highlights the importance of cross-team collaboration, effective incident response processes, and continuous monitoring for potential vulnerabilities.
Company
Datadog
Date published
Aug. 27, 2024
Author(s)
Ahmad Mustafa, Nicolas Picard, Théo Guidoux
Word count
2180
Language
English
Hacker News points
None found.