The OGO Blog

What is a Single Point of Failure?

single point of failureHave you been reviewing your DR plan, checking its robustness, and trying to mitigating failure points? If so, this post is for you.

First off, what is a single point of failure?

A general definition is: any part of a system that is not redundant, but if non-functional, would cause said system to fail. What are we really talking about here though? Your point of view might depend on what department of your organization you are a part of. IT professionals might think of that pesky router that could fail causing your whole office to lose internet access. This is a common and very apt example, but the key is to look beyond the obvious IT hardware that is typically associated with single points of failure.

What else can be a single point of failure?

Like the definition states, any part of a system that does not have redundancy would be a single point of failure. There are probably three words in that sentence that make you think of technology and computer hardware, but there are plenty of other parts of your “system” that could fail and cause problems.

If you had only one person who can control a critical server, then that person is a single point of failure. If that same person suddenly has to take an extended leave or has something unfortunate happen to them, you and your organization would not be able complete the tasks associated with that critical server until you replace the missing resource. On the other hand, if you had previously insisted that the key person train another individual or two, your system could continue to function at some level without the original resource. You would have just built redundancy into your staff and eliminated a single point of failure.

The examples abound when thinking outside of the box on this matter.

How can I find and eliminate single points of failure?

A thorough audit of your systems (using systems in the universal sense) can reveal single points of failure, but even then, some points are not so obvious. This would be a great time to bring in a DR consultant to help you find the typical pitfalls of your industry.

Obviously, the specific method of mitigation will vary wildly, depending on the system we are talking about. The most straightforward answer is to provide a secondary method of performing your necessary business function. As always, there is a balance between quality and cost of redundancy, but even knowing what your single points of failure are puts you ahead of the pack.

Related Content:

Top 5 “Do NOT’s” of Business Continuity

Top Five Disaster Recovery Testing Issues

Are RTOs (Recovery Time Objectives) Misleading?

Have Questions? Please contact us here: