UPDATED 17:15 EDT / OCTOBER 14 2009

AppleInsider Ascribes to Malice What Could Easily Be Explained By Incompetence

imageThis is truly bizarre. Though it just came to my attention, AppleInsider Monday reported that the whole Sidekick data loss problem we’ve been talking about the last several days?  They say it was most likely sabotage.

The fact that no data could be recovered after the problem erupted at the beginning of October suggests that the outage and the inability to recover any backups were the result of intentional sabotage by a disgruntled employee. In any other circumstance, Microsoft or T-Mobile would likely have come forward with an explanation of the mitigating circumstances, blaming bad hardware, a power failure, or some freak accident.

An act of sabotage "would explain why neither party is releasing any more details: for legal reasons dealing with the ongoing investigation to find the culprit(s)," one of the sources said. Due to the way Sidekick clients interact with the service, any normal failure should have resulted in only a brief outage until a replacement server could be brought up.

The very long outage of core functionality, followed by an incapacity to recover any data, both point to the possibility that "someone with access to the servers at the datacenter must have inserted a time bomb to wipe out not just all of the data, but also all of the backup tapes, and finally, I suspect, reformatting the server hard drives so that the service itself could not be restarted with a simple reboot (and to erase any traces of the time bomb itself)."

Unlike a more conventional incident involving a suspicious failure, the source said, "the Microsoft IT forensic investigators who would normally be called upon to investigate this sort of thing are all trained on Windows servers and have no clue of any of the details of the Sidekick service.

"If this was an ordinary sort of failure, the service would have come back within a day, so once again, all signs point to sabotage. If they erased the server hard drives, they would have to reinstall the OS on each affected server, then reload all of the server-side software and start everything back up, and who knows how many people are remaining at Danger who even know how to do all of that? Once again, there is no-one on the Microsoft side who is going to know how to do any of this.

"Certainly Microsoft has armored themselves against any kind of similar sabotage on the Redmond side, but Danger was always run like a small company where individual employees had a higher level of access to servers and such. With Google, Amazon, and others promoting their own cloud services, why would anyone choose Microsoft for anything remotely mission critical after this fiasco?"

image I’ve spoken to a number of individuals fairly familiar with SAN, and our own James Watters spoke to this very topic on Monday, when the AppleInsider report came out.

The irony here folks is that many in the popular press idiot seats are using this whole incident as a reason to mistrust cloud computing–when Amazon S3 would have proved a lot more fool proof way of outsourcing this data, than hiring some random PS guys to do custom SAN work.

The likelihood is that this failure is due to the SAN technology, at least according to the refrain I keep hearing from IT professionals who I’ve consulted on this.

“SANs can and will fail. It’s a simple fact,” one admin told me. “Any administrator worth his salt makes certain he has multiple redundancies in place for everything.”

According to another I talked to, Danger’s system was said to have been built on an Oracle Real Application Cluster, with the data stored on an SAN, designed to prevent hardware failure from impacting data accessibility. Despite using a high-dollar system like an RAC, that’s no guarantee that it’s unbreakable.

Over at MobileCrunch, one commenter says “I’ve got an IBM doc (sg246363) that says: “Prior to physically installing new hardware, refer to the instructions in IBM TotalStorage DS4000 hard drive and Storage Expansion Enclosure Installation and Migration Guide, GC26-7849, available at: […snip…] Failure to consult this documentation may result in data loss, corruption, or loss of availability to your storage.”

He continues:

Does that imply that plugging the wrong thing into the wrong place at the wrong time can eff up an entire SAN? Yep. It does, and from what the service manager for one of our vendors told me, it did happen recently – to one of the local Fortune 500’s.

Heck, without the proper checks and balances in place even a fat-fingered DBA or sysadmin could accidentally wipe all the data on a system, or even just that little bit that is critical. The SAN upgrade could be entirely co-incidental to the whole thing. Without a working backup they’re still screwed.

From everything I’ve heard, while sabotage may be theoretically possible, it’s far from plausible.  The complexity of these SAN units is generally proprietary, and difficult to write a logic bomb for, as is being alleged.

What exactly happened?  I’m not sure we’ll ever know for certain. 

What we can be certain of is that this isn’t an incident that points to the fallibility of cloud so much as the unavoidable fallibility of human process.  Whether it was sabotage, backup failure, or even “dogfooding” as AppleInsider claims, it points to a failure of management and process more than technology. Any pundit that tries to paint a picture otherwise has an agenda to push.


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.