Categories: CloudVirtualisation

Why HBO Chose Kubernetes To Help Stream Game Of Thrones

Game of Thrones is one of the most popular TV series in the world with fans watching both on the regular HBO network as well as via the HBO Go streaming service.

There are many different technical challenges for streaming a popular show like Game of Thrones and to help solve some of them, HBO’s developers turned to the open-source Kubernetes container orchestration platform.

At the KubeCon North America 2017 event on Dec. 7, engineers from HBO explained how and why they chose to use Kubernetes to help meet the demand for the recent Game of Thrones season seven that premiered in July and August 2017.

Kubernetes containers

“We went from not running a single service inside of a container to hosting all of Games of Thrones season 7 with Kubernetes,” Illya Chekrygin, Senior Staff Engineer at HBO told the KubeCon audience.

Kubernetes is an open-source container orchestration system that was originally developed by Google. Since July 2015, Kubernetes has been a hosted project at the Cloud Native Computing Foundation (CNCF) and benefits from the contributions of multiple end-users and software vendors.

Chekrygin explained that the HBO Go streaming service is a mesh of different API services all written in node.js as well as the go programming language. HBO’s streaming platform is deployed on Amazon Web Services (AWS) and had originally been built using EC2 virtual instances, set-up with auto-scaling capabilities. To help handle demand, the original HBO Go architecture also made use of load-balancers to help distribute traffic.

“HBO’s traffic pattern can be best described as, the wall,” Chekrygin said.

The “wall” is a dramatic spike straight up during prime time when viewers start to watch content on HBO. Chekrygin said that looking at the traffic patterns on HBO Go for Game of Thrones episode after episode and season after season, left HBO engineers with a lot of doubt about their future ability to be able to handle demand.

“The challenge with streaming a highly anticipated weekly show is the demand spikes when a new show is released,” Chekrygin said.

Among the various challenges faced by HBO engineers, was under-utilization of the deployed resources. Chekrygin explained that node.js code tends to only use a single CPU core. He noted that AWS EC2 instances that had good networking capabilities tended to be based on dual core CPUs. As such, HBO was only using 50 percent of the deployed CPU capacity across its deployment.

The ability to spin up new instances on EC2 wasn’t quite as fast as what HBO needed, so Chekrygin said that the engineers had to over-provision additional capacity to deal with sometimes unpredictable traffic patterns. There was also a need to over-provision ELB (Elastic Block Storage) instances a well to help deliver content.

“We were under-utilizing 50 percent of our CPUs and yet we found that we were running out of all of our other resources,” Chekrygin said. “So to keep up with usage, we had dedicated alerts that were sent anytime we crossed 80 percent utilization on ELB and other resources.”

HBO also found that in times of peak demand for Game of Thrones, it was also running out of available IP addresses to help deliver the content to viewers.

Why Kubernetes?

“I was sold on Kubernetes from the start,” Chekrygin said. “But we did our due diligence and looked at Mesos, (Docker) Swarm and ECS (EC2 Elastic Container Service).”

Among the reasons why HBO chose to go with Kubernetes was the improved utilization and introspection capabilities that the technology provides. Additionally, Kubernetes was seen by HBO as being faster and safer than other options.

While HBO settled on Kubernetes as its technology choice, it also had no plans to move away from AWS because the cloud service didn’t have its own Kubernetes service at the time HBO made the decision in 2016, though that has recently changed.

AWS just announced on Nov. 29, the availability of its own Elastic Container Service for Kubernetes (EKS). Chekrygin said that HBO engineers did a lot of work to make sure that Kubernetes was able to meet the production deployment needs of Game of Thrones.

Zihao Yu, Senior Staff Engineer at HBO told the KubeCon audience that HBO ended up using custom Teraform templates to help manage and deploy the Kubernetes cluster. Teraform is an open-source project developed by HashiCorp that helps organizations manage infrastructure deployments as code.

On the networking side, Yu said that HBO chose to use the open-source Flannel software defined networking (SDN) overlay technology and created custom security groups to help handle service delivery.

Does IoT security concern you?

  • Yes (89%)
  • No (11%)

Loading ...

Lessons Learned

Getting the Game of Thrones season seven premiere ready to be delivered by a Kubernetes-powered streaming platform involved many different steps, including lots of testing.

“For two or three months leading to the Game of Thrones premiere we ran a weekly mega load-test,” Chekrygin said. “Our first attempts, were just pitiful.”

What load testing showed the HBO engineering team was where they had gaps. Chekrygin said that fine tuning and analysis was required to get the platform ready for the Game of Thrones season seven premiere. Chekrygin said that without all the online resources, community events and discussion channels for Kubernetes, there is a very high chance that the HBO Kubernetes journey would not have ended well.

“For us, we found that many problems with our services were not caused by Kubernetes. They were there all along and Kubernetes just made them more visible,” Chekrygin said. “We looked at alternatives, but the biggest reason we chose Kubernetes was the vibrant and active community.”

Quiz: What do you know about the cloud in 2017?

Originally published on eWeek

Sean Michael Kerner

Sean Michael Kerner is a senior editor at eWeek and contributor to TechWeek

Recent Posts

Microsoft Outage Impacts Airlines, Media, Banks & Businesses Globally

IT outage causes major disruptions around the world, after Crowdstrike update allegedly triggers Microsoft outages

57 mins ago

GenAI Integration Efforts Hampered By Costs, SnapLogic Finds

Hefty investment. SnapLogic research finds UK businesses are setting aside three-quarters of their IT budgets…

18 hours ago

Meta Refuses EU Release Of Multimodal Llama AI Model

Mark Zuckerberg firm says European regulatory environment too ‘unpredictable’, so will not release multimodal Llama…

20 hours ago

Synchron Announces Brain Interface Chat Powered by OpenAI

Brain implant firm Synchron offers AI-driven emotion and language predictions for users, powered by OpenAI's…

21 hours ago

Amazon Workers In Coventry Fail To Form Union

Amazon workers in Coventry lose union recognition ballot by just a handful of votes, amid…

1 day ago