Service, (Un)interrupted: How We Made a Non-EC2 Component Highly Available
In the book "High Availability: Design, Techniques, and Processes," Floyd Piedad emphasizes the importance of system availability from the user's perspective. A highly available system delivers operational performance consistently over a given period of time. Three principles of reliability engineering help achieve this: removing single points of failure, reliable crossover to redundant resources, and early detection of failure points. The case study by BrowserStack demonstrates how these principles were applied to make a non-AWS component highly available. By adding redundancy, implementing health checks, and using Route 53 for configuration, the system achieved inter-and-intra-region high availability while also benefiting from load balancing on Tweaker machines.
Company
BrowserStack
Date published
Aug. 28, 2019
Author(s)
Raj Patel
Word count
1215
Hacker News points
None found.
Language
English