Are RTOs (Recovery Time Objectives) Misleading?

RTO ClockIt’s been said, time is money, and that certainly holds true when you’re talking about down systems. Today’s consumers have an expectation of 24x7x365 access to their accounts while having little to no understanding of the complexity and costs involved in meeting that goal. But you’re committed to your members, so you’ve performed your Business Impact Analysis, developed your RTO’s, and built an infrastructure that supports quick recovery of critical systems. So why then are you still unable to achieve your uptime goals when systems systems/services are disrupted?  To answer the question, we’ll have to look at the definition.

Recovery Time Objective (RTO) is defined as how much time can pass from when a disaster occurs at your organization and when you need to be back up and running again. Sounds pretty straight forward right? Not so fast, two very important time periods are often overlooked when calculating the RTO! Notice it says “from when a disaster occurs”. Let’s use an example scenario to see how calculations can come up short.

Assumptions & Discovery

Assuming an RTO of <4 hours for critical systems (fairly normal for most organizations) – fire strikes your corporate headquarters at 10:00 PM taking out all communications and systems. It is reported at 10:45 by a passing motorist. Your leadership team is notified and gathers at a nearby branch at midnight to decide next steps. A disaster is declared at 12:30AM. Your DR provider is notified and begins recovery of critical systems – contractually, they have 4 hours from notification. So far your systems have been down 2.5 hours. Add to that the 4 hours your DR provider has and guess what? Minimum recovery time is creeping up to 6.5 hours – well over your desired 4 hour threshold! When calculating RTO, it is imperative to consider the time gaps associated with discovery, reporting and responding to an event.

Balancing and System Checks

Another area often overlooked when determining a true RTO (or at least one that the IT and business side will agree on) is the time gap between systems recovery time and when it actually becomes fully operational again. IT may be able to restore hardware, software and communications in a certain time, but one must not forget that after an unexpected disruption, systems must be tested to ensure the integrity of the data. This process often includes manual updates/data entry to recover lost transactions. Depending on the length of initial downtime, this effort could take several more hours. So it’s easy to see how RTO’s are often miscalculated and misleading your organization to believe recoveries can occur in the expected timeframe.

Basing RTOs on Controlled Tests

Another, perhaps more common misleading RTO is found when vendors provide RTO’s based on planned outages or exercises. Trust me, “recovery” from a planned outage is nothing more than a graceful shutdown and restoration and without a doubt will yield low RTO values. But what happens during an unplanned event? Today’s complex infrastructures and recovery sites rely on highly synchronized and tightly integrated processes, hardware, software, communications, and even people. There is a huge difference between a ‘planned switch’ and an ‘unplanned switch’! Due diligence calls for your service provider to provide planned and unplanned RTO values. And if they are equal, ask for the test results.

With this new information in hand, pull out your Business Impact Analysis (BIA) and review the RTO’s. Are they adequate? By accounting for the additional variables above, you’ll have a greater chance of meeting your recovery goals.  Fortunately once you have derived a reasonable RTO value there are many strategic options that you can consider to further strengthen your recovery efforts. For more ways to improve your RTO’s, check out our recent blog by Ongoing Operations CEO, Kirk Drake – “How Do I Improve my RTO, Recovery Time Objective?”

Related Posts:

How do I Improve my Recover Time Objective?

What does RTO Mean?

What does MAO (Maximum Allowable Outage) Mean and how does it relate to BIA?

Have Questions?

Cost-Effective Solutions for Your Credit Union

Simply fill out this form and select the topic(s) that you would like more information for, and our team will reach out shortly.

Medium

Role
I agree to receive marketing communications from Ongoing Operations regarding news, updates, products, etc.(Required)

blank
modal close button

Welcome to the Ongoing Operations blog archive.

For our most up-to-date information, please visit ongoingoperations.com.

HOME