Cloudflare outage caused by bad software deploy (updated)
On July 2, 2019, Cloudflare experienced a massive spike in CPU utilization due to a bad software deploy, causing visitors to their sites to receive 502 errors for about 30 minutes. The issue was resolved by rolling back the problematic deployment. This incident was not an attack and was caused by a misconfigured rule within the Cloudflare Web Application Firewall (WAF) during a routine deployment of new WAF Managed rules. The spike in CPU utilization resulted in global network outage, with traffic dropping by 82% at its worst. The issue was resolved by issuing a 'global termination' on the WAF Managed Rulesets and re-enabling them after testing the fix. Cloudflare is reviewing their testing and deployment processes to prevent similar incidents in the future.
Company
Cloudflare
Date published
July 2, 2019
Author(s)
John Graham-Cumming
Word count
471
Language
English
Hacker News points
348