AWS customers experienced partial network outages last month that affected an availability zone's connectivity to the internet. The failures highlighted the importance of monitoring key metrics and being prepared for infrastructure failures. Monitoring error distributions, especially outliers, can help detect "grey" partial failures. Additionally, relying on already-deployed infrastructure in another zone or region is crucial when dealing with shared infrastructure issues. Building for failure and having a contingency plan are essential to minimize the impact of such incidents.