Timeout. Let’s try this again. Tuning timeouts and retries at scale.
The text discusses two outages experienced by Mux Video and the lessons learned from them. Outage 1 was related to connection pools and timeouts, where a misconfigured timeout caused all requests to be returned as errors due to increased latency. Outage 2 involved retries and timeouts, highlighting the importance of careful configuration and monitoring for optimal system performance. The author emphasizes the need for designing systems that can tolerate failures at every layer to achieve near-perfect uptime.
Company
Mux
Date published
Sept. 21, 2020
Author(s)
Matt Ward
Word count
1741
Hacker News points
None found.
Language
English