/plushcap/analysis/incident-io/intermittent-downtime

Intermittent downtime from repeated crashes

What's this blog post about?

On November 18th, 2022, incident.io experienced an intermittent downtime of 13 minutes from 15:40 to 16:12 GMT due to repeated crashes. The team quickly responded by investigating the cause and implementing measures to improve reliability. They discovered that a bad Pub/Sub event was causing the app to crash, leading them to purge several subscriptions and temporarily turn off non-critical parts of the app. They also added panic handlers to ensure all goroutines are covered and physically split their application by work type for increased reliability and performance.

Company
Incident.io

Date published
Nov. 30, 2022

Author(s)
Lawrence Jones

Word count
2200

Language
English

Hacker News points
2


By Matt Makai. 2021-2024.