/plushcap/analysis/hookdeck/hookdeck-reliable-asynchronous-messaging-infrastructure

Aug 28-29th incident: Reaffirming our commitment to reliable async messaging infrastructure

What's this blog post about?

Hookdeck experienced a significant event delivery delay on August 28th and 29th due to their database vendor's storage auto-scaling feature. This incident led to severe consequences for many customers, resulting in the need for significant changes to maintain trust. The root cause was identified as the provisioning of slower disks instead of high-performance ones. No data was lost during the incident. Hookdeck is revamping its infrastructure and has decided to credit all customers based on their SLA policy. They will also make several changes, including updating their SLA to offer higher credit compensation, making real-time p99 latency public, working with the database vendor for improvements, and improving communication during incidents.

Company
Hookdeck

Date published
Sept. 6, 2024

Author(s)
Alexandre Bouchard

Word count
595

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.