Details of the Cloudflare outage on July 2, 2019
On July 2, Cloudflare experienced a global outage due to a poorly written regular expression that caused CPU exhaustion on every CPU core handling HTTP/HTTPS traffic across their network worldwide. The incident resulted in customers seeing a 502 error page when visiting any Cloudflare domain. The company has since taken measures to prevent such an occurrence from happening again, including re-introducing the excessive CPU usage protection that was mistakenly removed and introducing performance profiling for all rules to the test suite. Additionally, they are switching to either the re2 or Rust regex engine which both have run-time guarantees.
Company
Cloudflare
Date published
July 12, 2019
Author(s)
John Graham-Cumming
Word count
4537
Hacker News points
None found.
Language
English