Top Five Disaster Recovery Testing Issues

business continuity test

If you are responsible for performing Disaster Recovery Testing for your organization or Credit Union this post is for you.   Disaster Recovery Testing is the process of simulating a recovery of your critical systems, business processes and data to validate that you can meet your Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).  In more basic terms – it is the process of making sure if something bad happened to your company that you could stay operational.  It is important to note that there is a big difference in conducting Disaster Recovery Testing vs. Disaster Recovery Exercises.  All that being said – there are five main issues that we regularly see when assisting our clients with Disaster Recovery Tests.

Local Network Issuesbusiness continuity test

This is a big issue for almost all of our clients.  The problem is that everyone wants to make sure that all of their locations could be functional in an actual event.  The issue is that in order to accomplish this – you must create a separate network to isolate and securely validate the testing.  The process of doing this is generally overlooked and is much more challenging than you would expect. Plus, all of this extra work effectively negates how it would work in a real disaster.  Even so, if you have the right network skills on staff and the right switches and routers to separate the traffic – this is a great way to involve staff at other locations in the testing process while making sure you don’t accidentally use the test systems for real work and vice versa.  One example here is when you need to change the IP address of a server in your Test Environment.  Often times this triggers the need to change the IP Address of countless other servers, routes, and configurations.  This can take precious time and troubleshooting and seems to always lead to a duplicate IP address problem!

Server Recoveries

If you are still using an older product for data vaulting or tapes to do your recoveries this can be really painful.  What you want to avoid is products that require the correct match for your production hardware and instead use software that incorporates Physical to Virtual conversion.  Most modern evault products do this.  Ideally you want something that can do a Bare Metal Restore so you can simply point and click the recovery.  Even so, we find it always seems like there is a server or two that needs extra care to recover or folks don’t understand that most recoveries and tests are linear.  You can’t just fire off 57 server recoveries at the same time as modern Disk I/O configurations just can’t move or recover that much data at the same time.  Instead you want to be deliberate and plan out the recovery based on your Business Impact Analysis.  We find we can usually recover most servers in a few hours if it is planned correctly.  Don’t forget to also plan for network changes such as IP changes or routing configurations for your test.

Connecting to Outside Parties

Similar to the LAN testing issues – validating that you will be able to reroute your critical outside VPNs and third-party connections is important.  The reality is most providers aren’t equipped with the network staffing resources or the standard processes to make this easy.  Often we find that all we can really do is a simple ping test.  Fortunately – we have found this is not a big issue in actual recoveries.  Our experience says that the third-party providers bring their “A” teams to the actual events and things that take 8 hours in a test take 15 minutes when the right people are on the phone.  So, if you aren’t willing to pay your vendors the big bucks to simulate this and have the “A” team involved – rest assured that a simple ping test will probably be just fine and should satisfy the auditors.

Testing Business Processes

A lot of clients try and just test a server recovery or a couple of applications.  While this might be fine for the Technology Department.  We strongly believe that you should have other people involved in every test and have them running a normal business process such as Accounts Payable or posting ACH for our Credit Union clients.  This servers two major purposes – first it makes sure you have all the right tools at your Hot Site to recover the process.  Second, it helps the non-technical staff really understand how long something will take in a disaster.  This is key for setting realistic expectations and can also be great for freeing up budget resources to solve technology problems correctly.

Being Afraid To FAIL

We have seen countless clients never get around to doing real testing or setting up a self-improving business continuity process because a CEO or Board has the leadership afraid of failing.  The reality is that if you want a process that will actually work and to have well trained employees you should be pushing the limits every year.  Sure, in your first test you don’t want to try and do a live failover but each year you should design the test to be 80% achievable with existing process that you know will work and then push the boundaries on the other 20%.  Perhaps do the test without a key individual or take the 20 people involved and isolate them in 4 different rooms to simulate real world confusion.  The worst case scenario is that you will have a good cry and feel frustrated but most likely you will learn a great deal about your team, communication strategies, and what actually works.  Recognize that even a total failure in a disaster recovery test is a much better place to have a massive failure than in the real world!

Related Posts

Best Ways to Improve your RTO

Disaster Recovery Test Scenarios

WANT MORE INFO? Fill out this form:

 

 

Cost-Effective Solutions for Your Credit Union

Simply fill out this form and select the topic(s) that you would like more information for, and our team will reach out shortly.

Medium

Role
I agree to receive marketing communications from Ongoing Operations regarding news, updates, products, etc.(Required)

modal close button

Welcome to the Ongoing Operations blog archive.

For our most up-to-date information, please visit ongoingoperations.com.

HOME