Company
Date Published
Nov. 30, 2022
Author
Lawrence Jones
Word count
2200
Language
English
Hacker News points
2

Summary

On November 18th, 2022, incident.io experienced an intermittent downtime of 13 minutes from 15:40 to 16:12 GMT due to repeated crashes. The team quickly responded by investigating the cause and implementing measures to improve reliability. They discovered that a bad Pub/Sub event was causing the app to crash, leading them to purge several subscriptions and temporarily turn off non-critical parts of the app. They also added panic handlers to ensure all goroutines are covered and physically split their application by work type for increased reliability and performance.