/plushcap/analysis/incident-io/database-performance

Battling database performance

What's this blog post about?

The text describes a series of intermittent database performance issues experienced by an application over two weeks, with no clear cause initially identified. Various performance and observability-focused changes were deployed during this period, including moving policy violations to using a materialized view, adding new database indices, rewriting queries, and processing Slack events asynchronously. Despite these efforts, the issue persisted until improved observability measures were implemented, allowing for better identification of problematic operations. The root cause was eventually traced back to an unnecessary transaction being opened during modal submissions in a Slack integration, which led to many short transactions causing significant problems when combined. After removing this transaction and making other performance improvements, the application has been free from database timeouts for four months.

Company
Incident.io

Date published
April 20, 2023

Author(s)
Rory Bain

Word count
1933

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.