Chaos Engineering and Continuous Verification in Production
1. The myth of the bad apple refers to the belief that a single individual is responsible for causing problems in an organization, rather than acknowledging that complex systems are faultless and require adaptive capacity to handle incidents. 2. Continuous verification (CV) is a proactive experimentation tool that verifies system behaviors, as opposed to reactive testing methodologies that validate known properties. 3. Documenting best practices and runbooks can be valuable for communication purposes, but they do not improve reliability in complex systems because they cannot provide enough context or skill to improvise effectively during an incident. 4. The definition of resilience is having an adaptive capacity to handle incidents, which requires a level of improvisation and human adaptation that automated systems currently cannot replicate. 5. Self-healing algorithms can raise the floor of robustness in software systems but do not improve resilience because they rely on predetermined conditions and fail to account for the complexity of interactions within complex systems. 6. Feature flags are a powerful tool for managing dependencies and improving feature velocity, as they allow developers to make design decisions and change their minds quickly without causing major disruptions in production environments. 7. Effective incident response management involves studying resilience engineering principles, such as blameless postmortems or learning reviews, and understanding that complex systems require different approaches for navigating dependencies and managing risk. 8. Chaos engineering can be a valuable practice for improving system reliability by encouraging teams to discuss potential failure points and run experiments in staging environments before moving them into production.
Company
LaunchDarkly
Date published
May 21, 2020
Author(s)
Matt DeLaney
Word count
8744
Hacker News points
3
Language
English