The decidedly non-tech reasons for computer system failures
I recently read an IBM white paper entitled “Virtualizing disaster recovery using cloud computing.” The white paper, which mentions the advantages of using something such as IBM SmartCloud™ Virtualized Server Recovery, is available in PDF form from this link.
One of the statistics cited in this white paper documents the reasons that a disaster recovery system had to be activated. As IBM notes, this information came from its own customers.
Think of the complex systems that enterprises deploy today. In many cases, the systems are heterogeneous systems consisting of a variety of components from a variety of vendors. Even if all of the components come from the same vendor – a vendor such as IBM (or Oracle, or whoever) – the difficulty of the interfaces between the components often results in maddening complexity. In these cases, it’s no wonder that even the best-designed systems fail at times. In fact, according to statistics from IBM’s own customers, hardware and software issues result in activation of disaster recovery systems 17% of the time.
Huh?
That’s right. For these complex systems, 83% of all disasters are NOT hardware and software related.
So while it’s important to get your hardware and software working properly, chances are that your system failure will have nothing to do with the software and hardware in the system.
So why do systems fail?
Remember that the technology of a system is not limited to the hardware and software in the system. Every computing system needs power to run, and if you lose the power, you lose the system. In fact, IBM’s customer data reveals that disaster recovery systems are activated 19% of the time because of a power failure.
Well, you can obviously design your system to mitigate against power failure with batteries and the like. But even if your hardware and software is resilient, and even if your power is resilient, there’s one other factor that could result in the declaration of a disaster – and this factor is the cause for 54% of all disaster declarations.
What is that primary cause?
The weather.
Yes, mundane old weather, the decidedly low-tech disaster that’s been affecting the world since who knows when. IBM explicitly mentions hurricanes, tsunamis, floods, and fires as potential reasons why a disaster recovery system would need to be activated. But these weather-related causes have destroyed facilities and cities for thousands of years.
As Hurricane Katrina shows, we’re no better at avoiding weather-related disasters today than we were 2,000 years ago.
Good ranking of risks.
Biggest risk – weather.
Second biggest risk – power failure.
Close third risk – hardware or software failure.
Pingback: Amazon cloud failure – non-tech CAUSE, but tech REASON « tymshft
Pingback: Hurricane Sandy – are we better off than we were four hundred years ago? « tymshft