/plushcap/analysis/mux/mux-tuning-timeouts-and-retries-at-scale

Timeout. Let’s try this again. Tuning timeouts and retries at scale.

What's this blog post about?

The text discusses two outages experienced by Mux Video and the lessons learned from them. Outage 1 was related to connection pools and timeouts, where a misconfigured timeout caused all requests to be returned as errors due to increased latency. Outage 2 involved retries and timeouts, highlighting the importance of careful configuration and monitoring for optimal system performance. The author emphasizes the need for designing systems that can tolerate failures at every layer to achieve near-perfect uptime.

Company
Mux

Date published
Sept. 21, 2020

Author(s)
Matt Ward

Word count
1741

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.