USA Flag Icon +44 01865-600-733

USA Flag Icon +1 713-636-5656

Armament Logo

The Road to Hell is paved with Single Points of Failure

Single Point of Failure depicted by Chain Links

Single Point of Failure depicted by Chain LinksAccording to Wikipedia: "A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working."

A Business Continuity Plan contains many vitally important elements. Addressing Single Points of Failure is one of the most crucial. Identifying and addressing choke-points and soft spots in your organizational process, before catastrophe strikes, could save your company.

In this strategy guide I’ll be taking an in-depth look at business process SPOFs and providing some useful tactics to guard against them bringing your operation to a standstill. Tackling SPOFs forms the backbone of any BCP strategy you develop.

The simple fact is that a great many companies across America and Europe are at risk of severe operational outages caused by such single points of failure - choke points - in their logistical and/or informational infrastructure. Worse still, most organizations are blissfully unaware that these fail-points even exist, until such time as their entire business is instantly brought to a standstill by a single, unforeseen event.

How this happens...

On Armament's main business continuity planning page we outlined how a Baltimore-based manufacturing company was brought to a halt by a tropical storm eight thousand miles away. However, it doesn’t necessarily take a freak Typhoon in the Western Pacific to trigger a catastrophic weakness in a business’ supply or production infrastructure. Depending on the size and scope of an organization, single points of failure can include equipment malfunction, server downtime, prolonged Internet outages, or even the absence of one specialist staff member¹.

Obviously such circumstances present considerable challenges from the Business Continuity perspective, because if the problem component fails, so does the entire operation.

Awareness is Key

Because SPOFs come in so many shapes and sizes, there’s no single way to combat them. Consequently you need to first diagnose and categorize which parts of your business process constitute bottlenecks and chokepoints, before you’re able to devise a strategy to remedy them or at least mitigate their impact.

Tractor Assembly Line

Tractor Assembly LineSome fail-points are unavoidable, such as crucial manufacturing equipment. If you only run a single assembly line you’re not likely to keep a spare one handy just in case the first one breaks. You just have to accept that there’s an ever-present risk of equipment failure, because the costs of eliminating the fail-point outweigh the risks.

Other points of failure exist simply because something got overlooked. Examples include neglecting to fit uninterruptible power-supplies or backup power sources to crucial equipment, or failing to have your server access credentials available to senior management, in case your IT guy gets run over by a bus.

Diagnose. Categorize. Remedy.

Risk Assessments are a challenge for most companies, due to a number of factors, so if you’re looking to delegate this task inhouse, consider the following pitfalls:

  1. Your Processes have been Functioning for Years
    And because things have been running smoothly for such a long time, there’s no reason for you to suppose that something can or will go wrong.
     
    This type of attitude fosters complacency when carrying out a risk assessment on your own processes, increasing the likelihood of a crucial fail-point being missed.
     
  2. You know the Risks already.
    Of course there’s a chance that your manufacturing equipment will break down. It’s just a matter of common sense and it’s a risk you’re forced to live with.
     
    While an awareness of your company’s unavoidable SPOFs is of course an asset, it can also be a hindrance. If you’ve simply accepted a single point of failure as a reality of life, you’re not thinking broadly enough. Certainly, the risk may be inherent. Nevertheless, there may be ways and means to minimize its chances of failure or to mitigate its overall impact if things should go wrong.
     
  3. You’re Intimately Familiar with the Process, from Beginning to End
    Nobody knows the business process like you do and you’ve been working with the system for years. If there was a weakness, you’d know about it.
     
    The more intimately familiar you are with your company’s processes from a managerial standpoint, the harder it becomes to pinpoint the details that could bring your operation crashing down instantly. It’s a similar situation to proofreading your own writings. You already know what you’re saying, so your mind automatically glosses over minor errors and omissions. In a risk assessment it’s those minor errors and omissions which are likely to come back and take you down, later.
     

A WARNING: Be aware that it’s easy for corporate politics to enter the frame. Risk Assessments are about finding the soft spots in your company’s armor, not about fingerpointing or covering up for anyone. Be sure that whoever you delegate this task to is not only up to the job, but also neutral enough to stay objective when attacked for their findings and recommendations.

Diagnose

Most of the time, SPOFs are out of sight and out of mind, tucked away in a quiet niche of the processflow where stakeholders and employees won’t look for them. That’s why it usually takes an independent expert eye, without any preconceived ideas, to conduct a truly thorough risk assessment and identify all critical fail-points and operational risks.

Categorize

Once you have your list of SPOFs, you need to break them down into the following three categories, according to the biggest potential impact for your operation:

  1. Easy to Fix
    This includes any fail-point which can be eliminated directly, in a short timeframe, and with reasonable costs.
     
  2. Hard to Fix
    Any single point of failure which cannot be directly remedied, but for which a workaround or partial mitigation can be developed, is classed as a hard fix.
     
  3. Impossible to Fix
    As the name implies, this category is reserved solely for fail-points and risks without a reasonable remedy or workaround.
     

Easy to Fix problems will usually be at the top of your list. However, if you discover a Hard Fix with a higher risk and more severe business impact, this should be tackled as a matter of priority. Depending on the severity of the problems discovered during your risk assessment, this stage of the operation can take a lot of time and effort.

A trained expert will often spot solutions and workarounds which those close to the operation might not immediately consider. It’s often possible to eliminate or mitigate even difficult SPOFs, with some brainstorming and creativity. The trick here is to persevere, think laterally, and follow through on all tabled ideas, even if they initially seem far-fetched.

Remedy

After you’ve categorized and prioritized every single point of failure, and you’ve created a plan to remedy or workaround each one, it’s time to implement your fixes. The only way to deal with Single Points of Failure is to CREATE REDUNDANCY. You need to implement failover systems to fall back on, should your primary system break down.

Top-Down Image of Cluttered Warehouse

Top-Down Image of Cluttered WarehouseFor instance, if your warehouse manager is the only one with the access credentials to your logistics software, an illness or accident will cause an operational outage for your company. The easiest way to create a redundancy would be to keep the access credentials in safe storage and train another staff member to handle the system. That way, if your primary operator goes missing for any reason, you can simply switch to the secondary.

If you’re dealing with unavoidable chokepoints in your supply chain, like a single manufacturer producing a specialized component, you’ll want to raise stock levels and source a backup producer to step in if your primary suffers an outage, as our Baltimore electronics manufacturer should have done. In this manner your company is more likely to survive without operational downtime, should anything happen to your existing supplier.

Remedial actions are as varied as SPOFs, and your own workarounds and strategies will very much depend on the nature of your organization and on the specific chokepoints you discover.

Don't put off creating a Business Continuity Plan

Conducting Risk Assessments and producing a Business Continuity Plan inhouse is a protracted process, requiring considerable company time and resources. However, a thorough BCP can save your company from a costly outage or even from outright business failure at a moment’s notice when things go wrong, so it should be treated as an investment, not a cost.

It's essential to implement a full Business Continuity Plan that outlines both your choke points and your workarounds. Because the Business Continuity Planning process is a lengthy and complex one, many organizations opt to contract a professional agency for this task.

What else should a Business Continuity Plan contain?

BCPs are not by nature wordy. They are concise and to the point, focusing mainly on fail-points and solutions. In addition to these, it’s essential to include the following:

  1. List of Critical Phone Numbers/Contact Details
    This includes emergency plumbers and electricians, IT support, essential vendors and suppliers, legal, insurance & financial support, and any other personnel or organization that’s integral to your organization’s functionality.
     
  2. List of Critical Processes
    Establish and list the barebones, minimal processes required to retain basic operational functionality for your organization. If something goes wrong, you’ll need it to focus efforts on retaining/restoring crucial business components.
     
  3. List of Critical Staff Members and Contact Details
    If things happen you don’t want to hunt around to see who is responsible for which component or process. Establish your operational hierarchy and list out the relevant staff members, together with their contact details.
     
  4. Hardware and Software Inventory
    All hardware fails, on a long enough time scale. When it does it often takes your software out with it. You’ll need a list that details systems and programs, both for insurance purposes, and to know what replacements to purchase.
     
  5. Printouts and Digital Copies of your BCP
    If your BCP is digital and your computers fail, you’re up the proverbial creek.
     

You may be surprised how many organizations go through the time and trouble of creating a business continuity plan and only retain digital copies. THIS IS ACTUALLY A SINGLE POINT OF FAILURE IN ITSELF.

WE ARE YOUR Armament Solutions Limited Logo


Sources:

¹ This happens on larger scales too. One spectacular example of an SPOF at the national level occurred during 2016, when the collapse of the Nipigon River Bridge severed the road link between Eastern and Western Canada for several days, due to the complete lack of a detour route.



 Standards

NFPA 1600 & NIST 800 Standards

Armament Solutions

Suite 103, 2163 Lima Loop

Laredo, TX, 78041

United States of America

USA Flag Icon +1 713-636-5656


 

International House, 61 Mosley Street

Manchester, M2 3HZ

United Kingdom

USA Flag Icon +44 01865-600-733