Date Published
Andrea Echstenkamper
Word count
Hacker News points


In June, the Test in Production Meetup focused on chaos engineering with a talk by Nora Jones, Senior Software Engineer at Netflix. Chaos engineering is the discipline of experimenting on production to find vulnerabilities in systems before they become unusable for customers. At Netflix, this is done through a tool called ChAP (Chaos Automation Platform), which allows users to inject failures into services and test assumptions about those services. The goal of chaos engineering at Netflix is to improve availability by proactively finding vulnerabilities in services using live production traffic. Safety and monitoring are crucial aspects of testing in production, as they help contain the blast radius and ensure a good observability story. ChAP has found numerous vulnerabilities within the Netflix ecosystem, leading to improvements in service resilience and customer experience.