Service, (Un)interrupted: How We Made a Non-EC2 Component Highly Available

Post Details

Company

BrowserStack

Date Published

Aug. 28, 2019

Author

Raj Patel

Word Count

1,215

Language

English

Hacker News Points

-

Source URL

www.browserstack.com/blog/service-uninterrupted-how-we-made-non-ec2-component-highly-available

Summary

In the book "High Availability: Design, Techniques, and Processes," Floyd Piedad emphasizes the importance of system availability from the user's perspective. A highly available system delivers operational performance consistently over a given period of time. Three principles of reliability engineering help achieve this: removing single points of failure, reliable crossover to redundant resources, and early detection of failure points. The case study by BrowserStack demonstrates how these principles were applied to make a non-AWS component highly available. By adding redundancy, implementing health checks, and using Route 53 for configuration, the system achieved inter-and-intra-region high availability while also benefiting from load balancing on Tweaker machines.