
Having a disaster recovery (DR) plan is an important part of ensuring business resiliency and continuity in the face of any event that causes a loss of access to the company’s primary production environment.
However, it’s not enough to simply have a disaster recovery plan—businesses need to test that DR strategy to verify that it will work when they need it.
A recent real-world example of the need to implement and test a DR strategy would be the Delta airlines debacle that occurred in early August.
According to news coverage from the Wall Street Journal, “An electric problem at its Atlanta headquarters occurred at 2:30 a.m. ET and the airline was forced to hold hundreds of departing planes on the ground starting at 5 a.m.”
One relatively minor electrical problem resulted in hundreds of cancelled and delayed flights, costing the airline giant millions in revenue, and damaging the company’s public image.
Why Testing Your DR Plan is a Necessity
Leaving the fate of a company to an untested, unknown suite of tools and technologies is reckless at best. Yet, when companies fail to test their DR plans, that’s exactly what happens.
One potential issue is that some DR setups might not address all of the potential failure points and create enough redundancy to ensure that the plan will work in all instances. Rigorous testing helps companies identify these potential fault points in a DR plan and add the necessary redundancies to avoid a total failure.
Another issue is the time it takes for a DR solution to engage and bring operations back to normal after a “disaster” strikes. A DR plan might have a stated recovery time objective (RTO) of just a few hours, but without testing, how can a company be sure of that?
“A DR plan might have a stated recovery time objective of just a few hours, but without testing, how can a company be sure of that?”
Testing the DR plan gives the IT department a much more reliable estimate of how long it will take for business operations to fully recover than a “best-case” estimate from the DR plan’s architect.
Ultimately, the biggest problem is that letting a DR plan go untested leaves the risk of the failure when an emergency strikes.
To control risks, businesses must regularly test their disaster recovery strategy to find potential fault points and correct them before an actual emergency strikes.
Testing a DR Plan
So, how can a business test its DR plan to check for potential faults? There are actually a number of ways to test your DR solution, some more practical than others:
Testing the Replication Environment. Rather than bringing the primary environment completely down, simply spin up the replication environment. This allows for no disruptions, a reliable estimate of actual recovery time, and establishes if the replication environment can handle the load.
Key elements that your test should include:
- Testing RPO & RTO
- Prioritizing workloads – what mission critical systems have to come back up in a disaster?
- Proper DNS failover testing
Shutting Down the Primary Environment. This is a potentially dangerous simulation of a DR solution any business can try, but it does showcase exactly how a real disaster may affect the company. This method will also show all the deficiencies in your existing DR strategy, however, it is the most disruptive to the business.
The specific nature and steps involved in testing your DR solution may vary based on your business’ DR plan and setup, but the above steps cover some of the broad-strokes strategies.
Best-in-class DR solutions will stand the rigors of heavy testing and even provide helpful tools and resources to make running such tests easier on businesses.