Datadog experienced a global outage on March 8th, which was the first of its kind for the company. The incident involved several hundred engineers working in shifts and using various communication channels to resolve the issue. This post describes Datadog's incident response process, including monitoring systems, high-severity incident management, training, and a blameless culture. The outage provided valuable lessons on improving internal response, customer communications, and overall preparedness for future incidents.