UPDATED 10:00 EST / DECEMBER 23 2014

HP thinks it’s got a better way to run Hadoop | #HPdiscover

Steve Tramack, HPRunning Hadoop on converged infrastructure is not a particularly attractive proposition. The massive workloads that the data-crunching framework is designed to handle require an entirely different ratio of compute and storage resources than the typical enterprise application demands. That balance can be all but impossible to cost-effectively address when the two come in a single box that only scales horizontally.

Or at least, it used to be. In a recent appearance on theCUBE from HP’s Discover conference in Barcelona, HP Senior Engineering Manager Steve Tramack said his team has managed to overcome that limitation with a unique architectural approach that allows organizations to take advantage of the convenience of the Hadoop deployment model without compromising the efficiency of their analytics clusters. HP hopes it will be a game-changer amid the explosion of use cases for Hadoop.

“As organizations start to aggregate and assimilate data, they’re starting to see business value and all of a sudden they’re moving from batch to multiple workloads, and those workloads bring multiple copies of the same data and different requirements,” Tramack told theCUBE host Dave Vellante. To address the growing diversity of applications running on top of Hadoop, HP is redefining the supporting infrastructure.

The company recently unveiled a design for a converged system that takes advantage of new features in the latest version of Hadoop, namely the ability to define groups of nodes within a cluster and distinguish different storage types, which Tramack called an “asymmetric” environment. Instead of running the entire platform on the same infrastructure, each workload and component is deployed on the partition best suited to meet its specific requirements.

In HP’s reference architecture, the Hadoop File System is distributed across the storage servers of the different nodes while YARN and the other software handling the manipulation of data is deployed on flash-equipped Moonshoot systems packing 45 of Intel’s newest data center processors. “We’re using that for file system access, so we’re gaining the benefit of flash in a very cost-effective manner and we’re using spinning media for the primary data storage,” Tramack explained.

The concept isn’t new, he added. “These concepts are very similar to what’s in the Cray architecture, with its neat little compute blocks and storage blocks.” Unlike supercomputers, however, converged infrastructure ships one compact module at a time, which lowers the barrier to entry and enables organizations to be much more flexible in how they scale their environments while optimizing hardware use.

Watch the full interview (15:43)


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.