Anyone attempting to reach several high-profile websites a few weeks ago was greeted with blank screens. Last month, a fire at OVHcloud illustrated how vulnerable online businesses could be if they don’t have adequate contingency plans.
The company’s datacentre hosts over three million websites across Europe. In a statement, OVHcloud said: “For customers who have been impacted, we are offering replacement infrastructures (Bare Metal, Hosted Private Cloud and Public Cloud) in our Roubaix (RBX) and Gravelines (GRA) data centres.”
Every business that has an online presence or uses network infrastructure should have robust and comprehensive contingency plans in place to mitigate any risk of downtime. Many enterprises, though, can be guilty of a less than complete approach to their network health and can often pay even less attention to the potential disasters these systems could potentially suffer from.
According to the Data Centre report from Vertiv contributors to their report into the future of datacentres concluded: “Participants were asked to define their dependence on data centres based on the impact of an outage to their business. Some 38% identified their data centre as “critical to their business” while an additional 21% said the business is “totally dependent” on the data centre. Only 10% indicated their business could “operate for limited periods without computing.”
Speaking to Silicon UK, David Lanagan, Infra and Apps Practice Lead at of CloudStratex, says: “Contingency planning is, of course, a critical part of the design of any system or service. However, it is often something that not assessed properly.
“True understanding of the criticality of your services and the level of resilience they need is a must for proper technical design but must be done in the round. Understanding the business side of any given contingency plan is just as important. Often we see technology trying to design around a level of criticality that is assumed rather than worked on with the business. This can, and does, result in service being over or under-designed – both with potentially huge financial consequence.”
Balancing cost and the need for perhaps more comprehensive contingency planning is a constant challenge for CIOs and CTOs. As the datacentre landscape is in flux thanks to the pandemic and range of new technologies maturing, bringing a new contingency plan into focus is not an easy task, where relatively simple data backup was needed. Today, data and network infrastructures are more complex and dispersed, requiring a kind of contingency plan.
The contingency plans all businesses must have in place are about to change radically. As networks evolve to meet the challenges of a post-COVID-19 business landscape and as burgeoning technologies such as 5G, IoT, AI and the potential end of the hyperscalers, protecting these new data environments will be critical to business sustainability.
In their blog from 2018, Gartner states: “Workload placement in a digital infrastructure is based on business need, not constrained by physical location. I&O leaders must build an ecosystem of service partners to help enable scalable, agile infrastructures. Distributed digital infrastructure management will provide the tools for I&O to monitor and manage any asset or process, anywhere, at any time, enabling a successful transition to digital business. The movement to digital infrastructure will result in radically increased complexity for I&O so staff must be retrained, with a focus on versatility.”
In the context of data and network contingency planning, their views are president and define how management and oversight must change to meet the challenges business face today as they construct their new digital transformation roadmaps and overhaul how they secure their hybrid cloud deployments.
“One of the lessons from OVH was that many customers made assumptions about data being backed up,” explained Peter Groucutt, Managing Director, Databarracks. “One of the big benefits of cloud computing is that it decouples the individual building blocks of IT into discrete services. You don’t buy a server, you buy object storage, compute and networking. The benefit is that you consume exactly what you need. The downside is that it needs more thought and consideration.”
Groucutt continued: “The hyperscale clouds allow you to build resilience across zones within at regions. This lets you use multiple datacentres within a region. That allows you to deal with issues affecting and individual data centre. There are still issues that can affect an entire region. Storms on the US East Coast have affected US-East-1 for AWS for instance.”
Also, Russ Kennedy, Chief Product Officer for Nasuni, explained to Silicon UK a three-step process to develop more robust contingency planning:
“The essential features of an effective contingency plan for the enterprise need to be speed and data recovery as close to the point of attack as possible – allowing your business to restore operations quickly and for many global sites simultaneously.
“Once you restore a copy in the cloud, that change is synced out to all your other regional or global sites. You also need an immutable file system so that every version is immutable. This means that the file system is highly resilient, any incident occurring cannot corrupt those previous versions. The advantage here is not so much the fact that files are stored in the cloud, but how they are stored – as immutable WORM (write once, read many) data.
“Finally, as part of your plan, you need a solution that is testable. You need to prepare, plan, and run through simulated attacks and potential risk scenarios across your organisation. Work closely with your cloud provider to come up with recovery playbooks and run quarterly tests on small data sets. That way, if you are attacked or experience a disaster, you’ll know exactly what to do to recover as quickly as possible and ensure business continuity.”
Vertiv’s report that looks ahead to datacentres in 2025 concludes: “Since 2014, we’ve seen larger and larger cloud facilities being developed, creating a class of hyperscale facilities with distinct and innovative architectures. At the same time, more data is being generated and consumed at the network edge, forcing compute and storage closer to users and devices in the form of mini and micro data centres.”
The report also asks the immediate question: ‘How many computing sites is your company supporting today, and how many do you expect by 2025?” The results of their survey are telling with participants who have edge sites today or expect to have edge sites in 2025, more than half (53%) expect the number of edge sites they support to grow by at least 100%, with 20% expecting a 400% or more increase. Yet even this doesn’t fully capture the magnitude of the change.
Also, according to a similar survey from Emerson: “Our survey concludes 10% of participants believe the enterprise datacentre of 2025 will be one-tenth the size of current facilities, while 58% expect that datacentres will be half the size of current facilities or smaller. This could make owned resources more competitive in relation to the cloud as smaller, denser datacentre are easier to build and operate as a Tier IV, 2N+1 configuration than very large ones.”
Databarracks’ Peter Groucutt concludes: “The fundamental challenge all organisations face is balancing cost and risk. The technology is available for you to be as resilient as you would like. You could have high availability across multiple continents with backups of your data in each major cloud provider. The cost to do so would be incredibly high. It is up to you to choose how much protection you need to meet your recovery objectives. Backups are inexpensive but have a long recovery time. High availability across multiple sites is fantastic resilience but is very expensive.”
Datacentres, whether centralised for more dispersed as edge networks become commonplace, must all have a comprehensive and integrated contingency plan. The fire at OVHcloud illustrates how vulnerable any business can be to unforeseen events.
If your business’s datacentre had a fire tomorrow, how would this impact your enterprise? And do any contingency plans’ you do have go far enough? Network infrastructures are rapidly changing. It’s time for a new approach to contingency planning as well.
David Halford, VP of Business Continuity, Fusion Risk Management.
What does the recent fire at the French cloud service provider OVHcloud teach us about the importance of contingency planning?
“Unfortunately, it reinforces the trend we see many companies make when using cloud services. The risk associated with a facility outage and other significant events is often assumed to be covered by the agreement. This assumption clearly requires validation – ‘Trust but Verify’ is a critical component of contingency planning and must be incorporated into these type of services.”
Many businesses focus on data security. Is the security of the physical infrastructure is stored on and travels across often forgotten?
“The statement is even more significant when considering a hybrid model with some infrastructure on-premise and a portion provided by a solution provider.”
What are the common challenges businesses face when developing and maintaining a contingency plan for their IT infrastructure?
“The most prevalent issues are in three key areas:
“Lack of clear management support, funding for a solution, and recognising the importance of contingency planning vs just data backups and protecting the data. Contingency planning ensures you can continue operating and providing a solution to your clients. Data security and protecting your environment from a cyber breach, while extremely important, do not address Operational Resilience type contingency planning. Both can put you out of business and need appropriate funding and executive support.
“Maintaining a clear picture of what needs to be recovered based on priorities, recovery timing, and how the actions will actually be executed. This comes down to the basics of maintaining change control in conjunction with planning and exercising and validating the required capability.
“Having a defined methodology and system capable of keeping the required information current. This includes all the business and technology dependencies required to understand the environment, recognise how it might break, and ultimately use that knowledge to build mitigation and active response plans.
What are the core components of a Crisis Management Plan (CMP)?
“CMP plans are most appropriately considered as an overall plan focused on command and control of a crisis event. The content and depth of this command and control depend somewhat on the enterprise; however, the baseline information is based on the following key components:
“An established programme methodology that at minimum defines under what circumstances an ‘incident’ is elevated to a crisis and who makes that decision. The programme methodology should include all the related details of how a crisis will be managed and what to expect as a stakeholder and participant.
“Defining how the event will be managed including the defined team, leadership structure, communication plans, and approach for executing actions based on the unique information associated with the event itself.
“While this may sound similar to the above, it is critical to have a defined methodology and system capable of keeping baseline information current. This again includes a clear understanding of the environment and being able to recognise how it might break. Often the condition on the ground during an active event requires quick decision making. Having a system that enables quick analysis and decision making based on the current situation is required.”
Has the pandemic impacted the contingency planning businesses have in place to protect their network infrastructures?
“In some areas, it has reduced pressure on network infrastructure capacity itself however it has significantly increased the security-related concerns and how an enterprise network is used by remote employees. This of course depends on much of the environment is cloud or SaaS-based.”
Businesses are increasingly adopting a hybrid approach to their cloud deployments. What does a robust and comprehensive contingency plan look like in this context?
“The most important aspect is recognising the hybrid environment as it is and adding the due diligence, security methods, risk analysis, and ‘Trust but Verify’ to all parties (cloud and third parties included). Often, an enterprise will accept the ‘contract terms’ as having it covered. However, the financial liability associated with most cloud solutions if they fail do not come close to the real enterprise business impact. You must treat it as any other operational risk.”