Exploring performance differences between Amazon Aurora and vanilla MySQL

Company

Plaid

Date Published

June 10, 2021

Author

Michael Troute

Word count

2070

Language

English

Hacker News points

137

URL

plaid.com/blog/exploring-performance-differences-between-amazon-aurora-and-vanilla-mysql

Summary

Amazon Aurora MySQL is a primary relational database for Plaid. In 2019, they noticed performance degradation on one of their Aurora clusters due to an errant long-running transaction on a read replica. This issue seemed to violate the assumption that read replica load should not impact writer performance. Digging deeper into the design and documentation of Aurora DB, Plaid identified the cause of the degradation and best practices to avoid it. In vanilla MySQL, transactions are isolated using multiversion concurrency control (MVCC), which allows each operation to behave as if it is the only operation running on the database. This is achieved through undo logs that can recreate prior states of rows by undoing updates. Long-running transactions can cause performance degradation for concurrent transactions, especially when querying heavily indexed tables with frequent changes. Aurora MySQL differs from vanilla MySQL in its persistence and replication models. In Aurora, compute (query execution) and storage (durably persisting data to disk) happen on different instances. Persistence and replication are both accomplished using the redo log. Reader and writer instances share a set of undo logs, which can lead to performance degradation when long-running read replica transactions occur. Mitigation strategies for this issue include reducing transaction isolation levels for replica instances, monitoring rollback segment history list length, performing potentially long-running reader queries on binlog replicas, exporting snapshots to S3 as Apache Parquet files, and using a clone to handle long-running reads.