Richard Blanford, managing director, Fordway, examines how companies should tackle disaster recovery
How critical is your data? In my experience, an organisation that can accept data loss or inability to provide services for less than the cost of putting in place appropriate resilience is extremely unusual.
If your organisation uses IT to deliver its products or services then you need backup, recovery and resilience unless you want to put the company at risk. Choosing not to put suitable resilience and recovery in place is a business decision which should be taken at a higher level than the IT department.
The right questions
Assuming you have convinced the board of the need for disaster recovery (DR), it is vital to ask the right business questions before recommending a solution. In what order should services be restored? What are the required RTO and RPO for each? The most important criteria for a backup and recovery plan are that it is realistic and proven.
The good news is that the cost and complexity of implementing DR solutions to meet sub-24 hour recovery times and recovery points is falling. Virtualisation is making services more portable, and SAN and application replication gives better options and more reliable recovery methods than tape. There are also a range of hosted and cloud DR options available, and the pros and cons of these will be discussed later in this article.
Matching DR solution to required recovery times
We generally recommend a tiered approach to data recovery where back-up from tape is the recovery of last resort. Organisations which want critical services restored within four hours cannot rely on back-up tapes: the data could be up to 24 hours old, and it could take 24 hours or more to restore, so the best they can hope for is restoration of data that is 48 hours out of date. Tape backups are also less than 100 percent reliable. We expect no better than 98 percent success rate restoring from tapes, particularly if they have been brought back from off-site. If you are using tape for your primary recovery then please ensure you test them regularly.
Firstly, for critical business processes, there needs to be a level of inherent resilience in the core infrastructure so that component or device failure does not impact the available services. Next, data replication to a second location, which may be cloud backup or similar, provides data in a more accessible format than tape for system recovery. Dependent on business criticality this can either be disk backup based, which offers the slowest recovery time or SAN or NAS data replication, generally using snapshots. For more critical services near real time recovery using application replication, either through inbuilt application capabilities such as SQL Server Always On or third party application replication tools, is now tested and works well.
Cloud is particularly well-suited to businesses who cannot provide DR themselves for a number of reasons; they may have no suitable second site, be unable to afford the cost, time and management overhead of a second environment to manage, or DR may be simply too difficult to achieve. Choosing a cloud-based solution enables them to have state of the art DR capabilities at a substantially lower cost than traditional DR services. Instead of having all their own backup equipment, they pay a monthly fixed cost per TB of protected data with limitless capacity.
Businesses may be able to reduce costs further by finding cloud solutions that use compatible infrastructure to their own, using technologies such as SAN replication, hypervisor replication and application replication to minimise data transfer and replication timescales.
However, there is a downside to using cloud for DR. Some solutions are located offshore, whereas to meet compliance requirements organisations may need to keep their data in the UK. It is vital to find out where the data centres providing the services are located. Data should be stored in a jurisdiction that has the correct safeguards in place and does not contravene the Data Protection Act or comparable legislation in other jurisdictions. If the potential service provider cannot provide a contract with guarantees on this, look for another provider.
In the past, security fears have been a key inhibitor for the take-up of cloud services. In most cases, however, these fears are overstated, and should be considered more general risk management than security. With appropriate due diligence cloud can actually provide improved security, because most cloud service providers will implement and manage considerably better IT security controls than internal IT departments.
Are cloud RPOs and RTOs really what they seem?
When comparing the RPOs and RTOs for potential DR solutions, organisations need to ensure that they understand what they are comparing. For example, many cloud DR solutions claim to offer faster recovery time objective (RTO), but around a quarter of respondents in a survey reported slower recovery times.
The fastest RTO is obtained by having services running active across two data centres with automated failover – irrespective of whether the data is hosted internally or in the cloud. However, the cloud response time will always be dependent on external communications, and the bandwidth for external communications is normally lower than that for internal communications – hence the discrepancy. However, the advantage of using cloud is that it gives you more options.
The same caution is needed when comparing costs. DRaaS – like all cloud solutions ‐ is pitched as cheaper than on-premise, but it may not be a cheaper solution in all cases. If you compare the cost of buying a second SAN with the cost of DRaaS, the cloud service will be more expensive as it includes the cost of people to manage the service as well as hardware and hosting. However, if the CIO already runs the DR service in-house and has the staff available to manage it and ensure replication has been completed successfully etc. then in-house will be cheaper.
Cloud backup does not need to be prohibitively expensive. For example, Fordway’s DRaaS enables organisations to have state of the art disaster recovery capabilities at the fraction of the price of traditional supplier services by using technologies such as SAN replication, hypervisor replication and application replication.
One organisation which has implemented cloud-based DR is Team BFK, one of the engineering consortia building infrastructure for London’s Crossrail. It has implemented a cloud-based solution using Fordway infrastructure in two UK data centres to provide storage and back-up respectively. Fordway provides a ‘warm standby’ recovery service, and in the event of a problem can bring all systems back online within 30 minutes to two hours.
Looking to the future
Cloud-based DR may not suit every organisation, but it offers specific benefits that should certainly be considered when reviewing an organisation’s DR strategy. It also offers a starting point to move other services to cloud, giving organisations confidence to consider cloud when they are ready for their next upgrade.
There are also new technologies coming along, such as hypervisor independent replication, which will enable DR at a more granular level and provide medium sized organisations with a more flexible and cost efficient solution.