Amazon fails, big time

Back in the early days of Amazon, around 1995 or so, a friend of mine placed an order that got botched. He mailed a letter (not email, but the real kind) to Jeff Bezos and never heard back: no acknowledgement, apology or “thanks for your support” message.

My friend deleted his Amazon account and has never purchased anything from Amazon.com since — something I make fun of continuously.

When it comes to purchasing books, ebooks, MP3s and certain gizmos, I am a one-click Amazon devotee. However, the past 24 hours have turned our company (Hammock Inc.) into a likely candidate to be like my friend when it comes to using Amazon’s “cloud services” for activities central to our business: We will likely shut down much of what we do there, and never look back.

First, some background. Well, actually, I doubt anyone really wants background who doesn’t already know it. However, you can read about it here, but only after several cups of coffee. Short version: The past 24 hours has blown Amazon’s reputation as a dependable provider of large-scale hosting of web services.

My story: We use three different major hosting companies for projects and web-properties for ourselves and for clients. Because we have used Amazon’s Web Services (AWS) for several years to host the heavily-used and constantly changing wiki, SmallBusiness.com, we typically develop new wikis for clients in the same hosting environment. “Because it’s Amazon” has been enough to gain approval for such an arrangement from some rather by-the-book corporate systems administrator types.

Hosting a wiki is not up there with using AWS for running Four Square. However, if you’re in the final 24 hours of a development cycle that has taken several months of work by a team of wiki-developers, editors and taxonomists, you don’t want to see your several months of work turn into a white screen for 24 hours.

We decided to wait it out yesterday, knowing we had a complete backup of the project and hoping not to shift into a mode where systems administration becomes the driving factor, when usability and tweaking copy, should be.

But, alas. We’ll (and by “we,” I mean someone other than me who will have to put up with me constantly bothering him) be spending the next few hours doing things I only slightly comprehend that use terms like “instance” in ways normal people don’t use them and other terms like “idempotence” that are made up so that systems administrators will get paid more.

Bottomline: I still like Amazon for MP3s and eBooks.

[Later: Not only has the site been restored, I’ve added greatly to my vocabulary of technical jargon I vaguely comprehend, but that sound important: “hot spares” and “warm failovers” are my new favorites.]

  • This underscores the importance of snapshots. It also clears up for me the flexibility of snapshots to be restored in other zones. Changes are happening with us in this department, but I’m not leaving Amazon. It’s still the best thing out there from what I can tell.

  • This underscores the importance of snapshots. It also clears up for me the flexibility of snapshots to be restored in other zones. Changes are happening with us in this department, but I’m not leaving Amazon. It’s still the best thing out there from what I can tell.

  • Agree this is quite a spectacle. My fingers are crossed that Amazon’s going to learn a lesson and be safer than ever. Wishful thinking?

  • I guess I should be clear: We’ll do whatever our systems guru determines is best and it will likely be things that include multiple solutions. I’m over my head when it comes to being something other a bit steamed at the moment.n

  • For me, the biggest let down here was Amazon’s description of how their infrastructure was configured in 4 “independent” zones, yet how the problems which occurred actually affected multiple zones at the same time.nnYou have to be pretty philosophical about what’s happened. Outages happen. I’ve had generator fires, UPS failures, SAN configuration issues which have dragged on for days. The cloud doesn’t make this any different, you still have to plan your infrastructure to work around those potential failures. The disparity I described above meant, however, that I couldn’t recover using the provisions I had in place until some 16 hours after the initial failure.nnLesson learned, will be looking into having data on standby in the California region to accompany my live systems in Virginia, but if there’s any reason to be disappointed in Amazon it’s not that they had a failure, it that they didn’t fail as advertised.

  • Pingback: Storytelling Business Social Media Marketing PR & Technology Curated Stories April 22, 2011()

  • Hot spare and warm failover? Define, please. Loves me some lingo.

  • Hot spare and warm failover? Define, please. Loves me some lingo.

  • Sure can tell that wikipedia article was written by a techie: “A hot spare or hot standby is used as a failover mechanism.” Uh-huh.