Post Mortem: The Ugly, the Bad & the Good
On February 24, 2012, at around 7:30 GMT, a DNS update by Cloudflare led to an outage affecting some websites for approximately 30 minutes. The new DNS infrastructure was designed to make updates faster but accidentally deleted the primary DNS database during the process. It took about five minutes for Cloudflare to recognize the issue, retrieve the backup, and push it to production. Some users experienced longer downtime due to cached results from their ISP's recursive DNS or issues with two data centers not taking all corrected DNS file updates correctly. The company apologized for the incident and has added safeguards to prevent similar occurrences in the future.
Company
Cloudflare
Date published
Feb. 24, 2012
Author(s)
Matthew Prince
Word count
1059
Hacker News points
None found.
Language
English