tymshft

There is nothing new under the sun…turn, turn, turn

Amazon cloud failure – non-tech CAUSE, but tech REASON

[UPDATED 7:45 – NETFLIX DID PURCHASE HIGHER AVAILABILITY SERVICES FROM AMAZON.]

On June 21, I wrote a post entitled The decidedly non-tech reasons for computer system failures.

Perhaps I shouldn’t have used the word “reasons.”

Let me explain.

Several services, including Netflix, Instagram, and Pinterest experienced a failure on Friday night. The common factor? All used Amazon services. As VentureBeat observed, the Amazon services were based in Northern Virginia. This was significant:

Amazon’s service health dashboard indicates that there are power issues in its North Virginia data center, most likely caused by severe storms in the region.

So the outage was caused by the weather. This is in line with what IBM found; as I noted in my prior post, IBM data suggested that 54% of all disaster declarations were caused by the weather.

Fair enough. But when one of my Facebook friends shared the VentureBeat story with his friends, my Facebook friend added the following comment:

if these systems went down then they didn’t replicate datacenters, and they didn’t do proper fail over, then they suck, and deserve to be down.

Now it would be overkill for every system to have five nines availability. I certainly don’t have RAID drives installed on my netbook (although there’s a forthcoming story in tymshft that talks about RAID on a personal computer). Now a computer aided dispatch system? THAT system needs five nines availability, and my former employer Motorola offers CAD systems that meet (PDF) that availability requirement.

Does Netflix require five nines availability? Well, that depends upon its users. I’d be willing to bet that most Netflix users haven’t thought about five nines availability. In the pre-Netflix days, back when people used to congregate in buildings called “movie theaters,” we didn’t necessarily think about five nines availability either.

In addition, I don’t know if Amazon promised five nines to Netflix and the other companies. My bet is that they didn’t promise it. My guess is that Amazon offered a higher availability option to the companies, and the companies rejected the offer for price reasons, sticking with a lower availability level. What could go wrong?

[UPDATE 7:45 – NETFLIX PURCHASED A SETUP THAT WAS NOT DEPENDENT UPON A SINGLE AMAZON SITE. FOR DETAILS, SEE MY MOST RECENT POST IN MY EMPOPRISE-BI BUSINESS BLOG, Today’s “I was wrong” – Netflix, Amazon, and availability.]

In my view, Amazon is going to get a lot of blame over the next few days because of this – heck, I titled this very post “Amazon” instead of mentioning Netflix, Pinterest, et al. However, the blame (if any is deserved) should go to the companies who didn’t buy a high availability package – either from Amazon, or from Amazon in combination with another service.

Weather was the cause of the outage – penny-pinching was the reason for it.

Single Post Navigation

3 thoughts on “Amazon cloud failure – non-tech CAUSE, but tech REASON

  1. Jim Ulvog on said:

    Seems to me like all backup and availability issues are driven by tradeoffs. How much extra cost are you willing to incur for how much reduction in risk.

    Netflix, Instagram and Pinterest made their decisions.

    I make my own decisions and am changing them. Previously, I backed up all my data files weekly.

    That’s changed because for the first time in, oh, 15 years or more, my primary QuickBooks database crashed. File was open and *all* of the customer data disappeared an hour later. No idea what happened. I had to recreate 3 days of input since my last weekly backup.

    I’m now backing up major applications several times a week. I’m willing to incur the extra cost to reduce risk.

  2. I’m a Windows user, but one thing that I’ve observed about the Mac is that backups are ridiculously easy to perform. Just plug in a configured external hard drive, and it automatically backs up. And of course you have the various online services that can automate the process.

    Of course these solutions will work for an individual or small business owner, but don’t necessarily scale up.

    Incidentally, you may have missed it, but after I originally wrote this post, I subsequently discovered that Netflix WAS configured to use multiple Amazon locations – and still had a problem.

  3. Pingback: Empoprise-BI: Credit cards, light bulbs, and vending machine … | The Sale Machine Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: