UPDATED 18:40 EST / DECEMBER 07 2011

Amazon Web Services logo NEWS

Is a Security Vulnerability Requiring a Global Amazon Web Services Reboot?

Amazon Web Services logoThis morning, thousands of Amazon Web Services customers received notice that their instances on the cloud service are scheduled for reboot.  “Routine ugrgade,” is the given reason but cloud experts believe it appears more than that and may be due to a security update needed to protect from a vulnerability in the Xen hypervisor.

The update is believed to be to all Xen-based instances launched before Dec. 5.  That means it is most likely a global reboot with 32-bit and 64-bit instances affected.

Cloudscaling Founder and CEO Randy Bias published a blog post with updates about the reboot. Best guess: a recent Debian/Xen security announcement may be relevant. Further, the root cause may be (CVE-2011-1166) with AWS probably just being cautious and rebooting 32-bit instances:

A 64-bit guest can get one of its vCPUs into non-kernel mode without first providing a valid non-kernel pagetable, thereby locking up the host system.

The Debian/Xen security update recommends updating Xen packages which is exactly what AWS did.

Bias posted an email that a customer received. It lists the instances scheduled to reboot.

Here’s an excerpt:

No action is required on your part. Each reboot will occur during the corresponding scheduled maintenance window listed above. Note that when a reboot is done, all of your configuration settings are retained. You also have the option to manage these reboots yourself at any time prior to the scheduled maintenance window.

If you do want to manage your reboots for yourself, or simply want more information on the reboot process, please visit the Amazon EC2 Maintenance Help Page at: http://aws.amazon.com/maintenance-help/

In an e-mail interview, I asked Bias about the reboot and its significance.

Is it certain that a Xen vulnerability was the reason for the update?

No. AWS has not provided details. The Xen vulnerability is speculation at this point. They claim it’s a ‘routine upgrade’, yet they are forcing many thousands of VMs to reboot within a week or face a forced reboot.  It’s unclear how big the impact is, but many people think it’s over most of their cloud.

What other reasons would there be for reboot?

It could be a ‘routine upgrade’, but usually this would be done in an organic manner where they let normal customer reboots get a fair amount of the existing VM population over some time horizon (1 month, 2 months, etc.).  The urgency and forced reboot within a short time window (~1 week) implies it’s not as ‘routine’ as they are claiming.

Any impacts? Is the reboot significant in any way?

Customers who haven’t architected their applications to be cloud-ready will likely face some pain. Fortunately, AWS is making it possible for them to manage the reboot process; however, for smaller customers I’m sure it’s quite painful unless they are using AWS for their data store (e.g. SimpleDB or RDS).

What’s remarkable about the reboot is how easy AWS makes it seem. This is a reboot across thousands of severs.  Would this be possible in an enterprise private cloud? Bias says there is no such cloud now of AWS size so there is no way to tell. But it is doubtful that any enterprise cloud provider could do a system wide update in less than a week’s time.


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.