Lightning Strikes Cause Of Google Cloud Outage

IBM

St. Ghislain, Belgium Google data centre shocked by four lightning strikes to local power grid that caused temporary power loss

Four successive lightning strikes that affected a data centre in Belgium were the cause of the four-day cloud outage last week, Google has admitted.

From Thursday August 13 to Monday August 17, errors on Google’s Compute Engine persistent disks in its Belgium data centre caused a blackout for a small portion of customers in Europe.

Loss of power

Google has now said that lightning strikes, which hit a local power grid, caused a loss of power to some of its storage systems, resulting in the persistent disk errors.

“At 09:19 PDT on Thursday 13 August 2015, four successive lightning strikes on the electrical systems of a European datacenter caused a brief loss of power to storage systems which host disk capacity for GCE instances in the europe-west1-b zone,” wrote Google this week.

google“Although automatic auxiliary systems restored power quickly, and the storage systems are designed with battery backup, some recently written data was located on storage systems which were more susceptible to power failure from extended or repeated battery drain.”

Despite the random nature of the cause, Google is fessing up and taking full responsibility for the outage.

“This outage is wholly Google’s responsibility,” said the search giant.

“However, we would like to take this opportunity to highlight an important reminder for our customers: GCE instances and Persistent Disks within a zone exist in a single Google data centre and are therefore unavoidably vulnerable to data centre-scale disasters.”

Google engineers toiled to fix the incident, with the number of affected disks progressively declining as engineers fixed issues. By Monday, a very small number of disks were left offline, totalling less than 0.000001 percent of the space of allocated persistent disks in europe-west1-b. Unfortunately, these disks are unrecoverable, said Google.

To prevent this happening in the future, Google said that it is working on upgrading its storage hardware so it will be less susceptible to power failures.

“Since the incident began, Google engineers have conducted a wide-ranging review across all layers of the data centre technology stack, from electrical distribution systems through computing hardware to the software controlling the GCE persistent disk layer,” said Google.

Take our data centre quiz here!