/plushcap/analysis/cloudflare/cloudflare-outage

Cloudflare outage caused by bad software deploy (updated)

What's this blog post about?

On July 2, 2019, Cloudflare experienced a massive spike in CPU utilization due to a bad software deploy, causing visitors to their sites to receive 502 errors for about 30 minutes. The issue was resolved by rolling back the problematic deployment. This incident was not an attack and was caused by a misconfigured rule within the Cloudflare Web Application Firewall (WAF) during a routine deployment of new WAF Managed rules. The spike in CPU utilization resulted in global network outage, with traffic dropping by 82% at its worst. The issue was resolved by issuing a 'global termination' on the WAF Managed Rulesets and re-enabling them after testing the fix. Cloudflare is reviewing their testing and deployment processes to prevent similar incidents in the future.

Company
Cloudflare

Date published
July 2, 2019

Author(s)
John Graham-Cumming

Word count
471

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.