Company
Date Published
Author
Roxie Elliott
Word count
882
Language
English
Hacker News points
None

Summary

The incident occurred on July 29 at 13:09 EST when the solid state disk in one of the core switches failed, triggering a failover to the backup routing engine. However, the failover was not completely successful, leaving some network devices unreachable and causing a portion of the network to be isolated. The recovery efforts were delayed due to a recently discovered problem with the software on the core switches, but ultimately, the affected switch was repaired and returned to production, restoring core switch redundancy. To prevent similar issues in the future, DigitalOcean implemented new procedures and made changes to their configuration and standard operating practices.