Reliability Review Q1 2022
Buildkite, a software development tool used by thousands of teams worldwide, has undergone a Reliability Review in Q1 2022 following several reliability incidents in late 2021. The company is now focusing on defining Service Level Indicators (SLIs) and Service Level Objectives (SLOs), which will help them better understand customer expectations and improve their product's reliability. They have also introduced error budgets, where teams must stop feature work and focus on reliability when the budget is exhausted. Additionally, Buildkite has expanded its cloud footprint by operating from a third availability zone in AWS, improving resilience to single AZ incidents. The company plans to continue working on database improvements, including potential migration to Aurora and partitioning of large tables.
Company
Buildkite
Date published
April 11, 2022
Author(s)
Miguel Molina
Word count
1066
Hacker News points
None found.
Language
English