Amazon Web Services Hit By Service Interruption

Amazon Web Services (AWS) was hit with a service interruption on Sunday, 25 August, that caused four hours of degraded service for customers of its US-EAST data centre availability zone and knocked a number of virtual machine instances offline. The degraded service was the result of an issue with a single networking device that failed, according to the company.

The first public acknowledgment from Amazon that there was some trouble with its cloud infrastructure came at 1:22 p.m. PDT on Sunday afternoon.

Degraded performance

“We are investigating degraded performance for some volumes in a single AZ in the US-EAST-1 Region,” an Amazon AWS status update reported.

The US-EAST-1 Region is a set of Amazon data centres located in Northern Virginia. Amazon refers to its data centres as “Availability Zones” (AZs). The purpose of the AZ concept is to have geographically disparate fault tolerance and stability on a global basis.

Amazon currently operates eight AZs in total, including three in the Asia Pacific region, one in Western Europe, one in South America and three AZs in the United States. US-EAST-1 is the only Amazon AZ on the East Coast; the other two AZs are US-WEST-1 located in Northern California and US-WEST-2 located in Oregon.

As it turns out, although Amazon did not report any trouble via its status update feeds for US-EAST-1 until 1:22 p.m. PDT on Sunday, the issue actually started approximately 30 minutes earlier. Amazon did not provide full details on the incident until 3:23 p.m. PDT, at which point an AWS status update noted, “From approximately 12:51 PM PDT to 1:42 PM PDT network packet loss caused elevated EBS-related API error rates in a single AZ.”

EBS is Amazon’s Elastic Block Storage service and provides persistent storage to virtual machines running on the Amazon cloud. Amazon noted that a “small” number of its cloud customers had virtual machine instances that became unreachable due to the EBS error. Among the sites that were impacted on Sunday afternoon were Airbnb, Instagram, Flipboard and Vine.

‘Partial failure’

“The root cause was a ‘grey’ partial failure with a networking device that caused a portion of the AZ to experience packet loss,” Amazon noted in its status update.

Amazon physically removed the failed networking device in order to restore service in US-EAST-1 to normal. It was not until 6:58 p.m. PT that Amazon’s status update gave the all clear, indicating that normal performance had been restored.

The US-EAST-1 issue on Sunday is not the first time that Amazon has had trouble with that data centre. In 2012, storms knocked off power to Amazon’s East Coast availability zones, leaving the service unavailable. There was also an incident in 2011 that hit the Virginia-based East Coast AZs.

The whole concept behind the AZs, though, is to help customers mitigate the risk of an outage in any one geographical area.

“When you launch an instance, select a region that puts your instances closer to specific customers, or meets the legal or other requirements you have,” Amazon’s AZ documentation states. “By launching your instances in separate Availability Zones, you can protect your applications from the failure of a single location.”

Are you a Google expert? Take our quiz!

Originally published on eWeek.

Sean Michael Kerner

Sean Michael Kerner is a senior editor at eWeek and contributor to TechWeek

Recent Posts

Apple Slashes iPhone Prices In China

Amid intense competition from Huawei and others, Apple has again slashed the price of its…

17 hours ago

Bitcoin ‘Creator’ Craig Wright Repeatedly Lied, Rules UK Judge

Damning ruling by British judge, after he rules that self-proclaimed bitcoin inventor lied 'repeatedly' to…

18 hours ago

Julian Assange Granted Right To Challenge US Extradiction Order

High Court rules Wikileaks founder Julian Assange can appeal against extradition to the US, despite…

19 hours ago

Tesla Layoffs Continue With Another 600 Jobs In California

Regulatory filing last week shows Elon Musk's Tesla is cutting another 600 jobs in California,…

21 hours ago

UK Regulator Declines To Investigate Microsoft’s Mistral AI Deal

Weeks after seeking feedback on Microsoft's partnership with Mistral AI, UK regulator says it does…

24 hours ago

UK AI Safety Institute To Open Office In US

Seeking collaboration on AI regulation, UK's AI Safety Institute to cross Atlantic and will open…

1 day ago