Amazon cloud failure – non-tech CAUSE, but tech REASON
[UPDATED 7:45 – NETFLIX DID PURCHASE HIGHER AVAILABILITY SERVICES FROM AMAZON.]
On June 21, I wrote a post entitled The decidedly non-tech reasons for computer system failures.
Perhaps I shouldn’t have used the word “reasons.”
Let me explain.
Several services, including Netflix, Instagram, and Pinterest experienced a failure on Friday night. The common factor? All used Amazon services. As VentureBeat observed, the Amazon services were based in Northern Virginia. This was significant:
Amazon’s service health dashboard indicates that there are power issues in its North Virginia data center, most likely caused by severe storms in the region.
So the outage was caused by the weather. This is in line with what IBM found; as I noted in my prior post, IBM data suggested that 54% of all disaster declarations were caused by the weather.
Fair enough. But when one of my Facebook friends shared the VentureBeat story with his friends, my Facebook friend added the following comment:
if these systems went down then they didn’t replicate datacenters, and they didn’t do proper fail over, then they suck, and deserve to be down.
Now it would be overkill for every system to have five nines availability. I certainly don’t have RAID drives installed on my netbook (although there’s a forthcoming story in tymshft that talks about RAID on a personal computer). Now a computer aided dispatch system? THAT system needs five nines availability, and my former employer Motorola offers CAD systems that meet (PDF) that availability requirement.
Does Netflix require five nines availability? Well, that depends upon its users. I’d be willing to bet that most Netflix users haven’t thought about five nines availability. In the pre-Netflix days, back when people used to congregate in buildings called “movie theaters,” we didn’t necessarily think about five nines availability either.
In addition, I don’t know if Amazon promised five nines to Netflix and the other companies. My bet is that they didn’t promise it. My guess is that Amazon offered a higher availability option to the companies, and the companies rejected the offer for price reasons, sticking with a lower availability level. What could go wrong?
[UPDATE 7:45 – NETFLIX PURCHASED A SETUP THAT WAS NOT DEPENDENT UPON A SINGLE AMAZON SITE. FOR DETAILS, SEE MY MOST RECENT POST IN MY EMPOPRISE-BI BUSINESS BLOG, Today’s “I was wrong” – Netflix, Amazon, and availability.]
In my view, Amazon is going to get a lot of blame over the next few days because of this – heck, I titled this very post “Amazon” instead of mentioning Netflix, Pinterest, et al. However, the blame (if any is deserved) should go to the companies who didn’t buy a high availability package – either from Amazon, or from Amazon in combination with another service.
Weather was the cause of the outage – penny-pinching was the reason for it.