From the June 2011 issue of Treasury & Risk magazine

Fail-Safe for Clouds

Companies with backup systems in place kept their businesses running when a major outage at Amazon shut down websites.

The Amazon Elastic Compute Cloud (EC2) outage that brought down a number of major Websites in mid-April, including social network sites Foursquare, HootSuite and Reddit, underscores the value of backup for cloud users—on traditional servers, another cloud or even another zone of their provider’s cloud. Amazon took the blame for the disruption, which involved the cloud’s Elastic Block Store (EBS) storage services, and said in a statement it will compensate customers.

San Francisco-based Mashery, which provides application programming interfaces to more than 100 brands, including Best Buy, Hoovers and the New York Times, escaped the outage. Mashery uses multiple Amazon zones for its own services and employs traditional data centers from Atlanta-based InterNap Network Services for additional backup. Mashery didn't think the EBS services that failed were designed for durability, and since they weren’t covered by Amazon’s service level agreement, it avoided EBS. 

“We had some customers whose underlying APIs used those services, and they had some challenges,” says Mashery CEO Oren Michels. “It was just as if they had a database server in their center that went down.”

Despite its heavy use of Amazon, Silicon Valley photosharing company SmugMug also stayed up. Like Mashery, SmugMug spread its systems across multiple Amazon data centers, avoided using EBS and planned ahead for the possibility of disruption.

“People would have stayed up if they had used the redundancy options Amazon provides,” says SmugMug CEO Don MacAskill.

Help in moving applications between cloud zones or to a traditional data center for backup is available from such vendors as New York City’s Bluewolf, whose 75 corporate customers, such as Time Warner, Toyota and Bank of America, were not affected by the Amazon outage.

“We had everything configured with Amazon so if there was a failover in one zone, another zone would seamlessly take over without any manual intervention,” says Bluewolf co-founder Michael Kirven. “When the Dulles [Va.] facility went down, a different facility in California picked up all the traffic.”

Bluewolf also offers a backup plan in case the entire Amazon cloud system goes down—an automatic switchover to a brick-and-mortar data center, he adds. “It eliminates some of the cost savings from going into the cloud, but it hedges your security.”

A backup plan needs to be in place and tested well before an outage occurs, Kirven warns. “If [the backup facility] isn’t in ‘hot standby’ mode, it’s very difficult to get it switched over. It can take days, weeks or even longer.”

Comments