Google Blames Gmail Outage On “Dual Network Failure”

Max 'Beast from the East' Smolaks covers open source, public sector, startups and technology of the future at TechWeekEurope. If you find him looking lost on the streets of London, feed him coffee and sugar.

Follow on: Google +

Two network outages at the same time caused the problem

Google has made a public apology for the Gmail outage that affected up to 212 million users on Monday –  blaming it on two coincidental network failures.

The issue lasted from 14:00 to 23:30 BST, causing delays to message delivery and problems when downloading attachments. The company said it was triggered by a failure of not one, but two separate networks at the same time.

“The two network failures were unrelated, but in combination they reduced Gmail’s capacity to deliver messages to users,” explained Sabrina Farmer, senior site reliability engineering manager for Gmail. She said the company was already taking several steps to prevent a similar scenario occurring in the future.

Double trouble

The Gmail outage lasted for around nine hours and affected nearly one-third of all inboxes, making it one of the most significant disruptions to ever hit Google’s email service.

pixbox77The delivery delay experienced by most users was just 2.6 seconds. However, about 1.5 percent of messages were delayed for more than two hours. Users who attempted to download large attachments in affected messages also encountered errors.

According to Farmer, following two network failures, the messages started piling up, and even though the engineering and networking teams were alerted straight away, they struggled to restore network capacity and clear the backlog.

“We’re taking steps to ensure that there is sufficient network capacity, including backup capacity for Gmail, even in the event of a rare dual network failure. We also plan to make changes to make Gmail message delivery more resilient to a network capacity shortfall in the unlikely event that one occurs in the future,” said the engineering manager.

The company is also making changes to the way its networking teams react to unexpected issues, in order to cut down response times.

Gmail’s last significant outage was a two-hour incident in December 2012.

Are you a Google expert? Take our quiz (network access permitting)!