Intermittent downtime from repeated crashes
On November 18th, 2022, incident.io experienced an intermittent downtime of 13 minutes from 15:40 to 16:12 GMT due to repeated crashes. The team quickly responded by investigating the cause and implementing measures to improve reliability. They discovered that a bad Pub/Sub event was causing the app to crash, leading them to purge several subscriptions and temporarily turn off non-critical parts of the app. They also added panic handlers to ensure all goroutines are covered and physically split their application by work type for increased reliability and performance.
Company
Incident.io
Date published
Nov. 30, 2022
Author(s)
Lawrence Jones
Word count
2200
Language
English
Hacker News points
2