January 25th Outage
On January 25th, Gitpod experienced a global outage lasting over an hour due to a DNS failure within its cluster. This resulted in workspaces not being able to start and existing ones experiencing data loss. The team immediately started investigating the issue and spun up new clusters as a precautionary measure. They discovered that traffic was unable to reach Google's primary DNS server (8.8.8.8) on port 53 UDP, causing timeout errors. After diverting traffic to new clusters, Gitpod resumed normal operations. The team is now focusing on improving data backup procedures and enhancing DNS resilience by using multiple DNS servers. As an apology for the outage, Gitpod is issuing credits to its customers.
Company
Gitpod
Date published
Feb. 4, 2022
Author(s)
Pavel Tumik
Word count
1134
Language
English
Hacker News points
None found.