Thoughts on the AWS outage: making the cloud more resilient to failure

Company

Cloudflare

Date Published

June 30, 2012

Author

Matthew Prince

Word count

1951

Language

English

Hacker News points

None

URL

blog.cloudflare.com/thoughts-on-the-aws-outage-the-failure-charac

Summary

A huge storm in 2012 caused power outages at Amazon Web Services (AWS) data center in Virginia, leading to service interruptions for many companies relying on AWS's cloud hosting services such as Netflix, Pinterest, and Instagram. The incident sparked a discussion about the different types of "cloud" services and their sensitivities to failure. Salesforce.com was among the first to promote the benefits of the cloud, offering Software as a Service (SaaS). Heroku, acquired by Salesforce.com, provides Platform as a Service (PaaS) that runs on top of Amazon's AWS service. AWS is Infrastructure as a Service (IaaS), allowing users to rent virtualized hardware resources. Underlying all these cloud services are servers, switches, and routers. When hardware fails in the cloud, different services react differently. Salesforce.com runs their own hardware and software with replication across multiple systems for data and application layers. Replicating data is relatively easy, but keeping it in sync is hard, especially when locations are geographically separated. Data synchronization makes geographic scaling of the Data & Application layer difficult, while the Front End layer can be distributed geographically without special application logic or complex data replication strategies. CloudFlare offers a scalable front end layer that can run in front of any web application to help it better scale and protect against threats and attacks.