Lightning Takes Down Amazon And Microsoft Clouds

Lightning strikes in Dublin disabled two major data centres and knocked back-up power supplies offline

Lightning strikes cut the power to two major Amazon and Microsoft data centres and disabled backup systems in Dublin on Sunday, resulting in up to twelve hours of downtime.

Lightning struck a transformer which Amazon said resulted in a fire and an explosion and then a total power outage. As well as Amazon’s Elastic Compute Cloud (EC2) and Elastic Block Storage (EBS) services being affected, Microsoft BPOS services also went down.

The power of the bolt was such that part of the phase control system that synchronises the backup generators was disabled, said Amazon on its Service Health Dashboard. It said it began investigating connectivity issues at around 03:00 GMT yesterday – twelve hours later it was still grappling to restore 100 percent access.

Customers Left With Nowhere To Go

Microsoft told eWEEK Europe UK in a statement that at around the same time a widespread power outage caused connectivity issues for European BPOS customers. Services were restored to all customers around seven hours later, it said. In the past year, the BPOS service worldwide has seen several outages and at least one data breach. Microsoft is trying to move customers across to the recently-launched Office 365.

Just six days ago, an article on the Daily Telegraph Website says Microsoft’s Dublin data centre includes a “comprehensive system of secondary electricity sources” and the whole operation could switch seamlessly to Amsterdam in the event of a “major catastrophe”. Microsoft would not say whether this system had come into play during yesterday’s power outage when asked by eWEEK Europe UK, but it appears it did not.

By 15:00, Amazon’s dashboard had reported that 75 percent of the EC2 instances affected had been recovered but the large scale of disruption meant manual intervention was necessary before the remaining EBS volumes and EC2 instances could be restored.

“While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed,” it said at that time. “In some cases EC2 instances or EBS servers lost power before writes to their volumes were completely consistent.

“Because of this, in some cases we will provide customers with a recovery snapshot instead of restoring their volume so they can validate the health of their volumes before returning them to service,” Amazon promised.

Among the Websites affected were the Telegraph’s puzzles page, an Amazon customer, and the Edinburgh Book Festival. Service-level agreement (SLA) terms are rarely made public but it would be reasonable to assume that, barring future downtime this year, Amazon at 99.86 percent  and Microsoft at 99.92 percent uptime will have some penalties to pay to their customers, assuming most of them hold a 99.99 percent SLA.

Microsoft’s Dublin site is its largest data centre outside of the US and its green credentials are heavily touted. For example, it uses Dublin’s naturally cool air for cooling rather than relying on power intensive refrigeration. Amazon opened its data centre in Dublin in 2008 and is planning to expand the centre with the conversion of a 240,000 sq feet (22,300 sq metres) building.