If 2012 was a breakthrough year for many organizations to take the plunge and join the “Cloud Rush” by adopting a wide array of SaaS, PaaS and IaaS alternatives, then 2013 will be the year they learn how to make good use of these powerful new resources without harming their businesses in the process.
While there have been plenty of successful deployments of cloud solutions which are producing measureable business benefits, there are still too many stories of service outages and other unanticipated problems having a detrimental impact on even the biggest cloud proponents.
The most recent example is Netflix, which was victimized by Amazon Web Services’ (AWS) latest service disruption on Christmas Eve. Netflix was beginning to regain momentum after a new pricing scheme turned off many of its customers. Like every consumer-oriented company, the holiday season is a critical sales period. So the latest AWS service outage couldn’t have come at a worse time.
AWS has reported that the outage occurred because of human error that resulted in production data related to its Elastic Load Balancing Service (“ELB”) in its U.S.-East Region being lost. Although the service disruption “only affected applications using the ELB service (and only a fraction of the ELB load balancers were affected), the impacted load balancers saw significant impact for a prolonged period of time,” according to the AWS report.
This isn’t the first time this area of AWS’s operations has experienced service issues. In fact, it was the focal point of multiple disruptions during 2012 alone. After each incident, AWS reported publicly about the cause of the problem and promised to change its operational protocols to prevent similar issues from arising again. But new incidents have occurred and AWS users are learning the hard way that they must take added precautions to prevent adverse consequences from these events.
Despite these incidents, demand for AWS cloud services and those from other leading cloud service providers continue to rise. And a growing number of brand-name companies, like Netflix, and other major institutions are becoming more reliant on cloud services to power various aspects of their operations.
Although Netflix received a lot of attention because they suffered the brunt of the effects of the outage, the company is not planning to abandon their cloud commitments. Instead they are carefully reexamining how they administer these services going forward.
Netflix’s CTO, Adrian Cockcroft, issued his own post-mortem regarding the AWS outage in which he stated, “Netflix is designed to handle failure of all or part of a single availability zone in a region . . . . We are working on ways of extending our resiliency to handle partial or complete regional outages.” He went on to say, “We have plans to work on this in 2013. It is an interesting and hard problem to solve, since there is a lot more data that will need to be replicated over a wide area . . . . Naive approaches could have the downside of being more expensive, more complex and cause new problems that might make the service less reliable.”
Cockcroft understands that capitalizing on cloud services has its risks, but they are far outweighed by the tangible business benefits. But he and his peers still have a lot to learn about how to maximize the value of today’s cloud solutions while mitigating the risks associated with using these relatively immature services.