Hortonworks to Take Apache Hadoop Beyond MapReduce, Says Arun Murthy
The Hortonworks team of ex-Yahoo Hadoop developers is focused on their announced goal of getting half of the world’s data onto Apache Hadoop within the next five years, says Hortonworks Co-Founder and Architect Arun Murthy. To do that it is developing a high-availability, massively scalable, fully open source version of Apache Hadoop based on the team’s work at Yahoo, where they ran Hadoop MapReduce across 50,000 machines.
The Next Generation Resource Manager, which will provide high-availability on customers’ Hadoop systems, is already in field testing with a limited number of users, he told SiliconAngle CEO John Furrier and Wikibon.org Chief Analyst David Velante in an interview from HadoopWorld 2011 in New York City on Nov. 8 and webcast live over SiliconAngle.tv. But, while high-availability and massive scaling will be a big deal for users, Hadoop needs more to meet the goal.
Until now, the only processing methodology Hadoop has provided has been MapReduce. While that is useful for many kinds of analysis and, says Murthy, will remain the main processing approach for Hadoop, it is not suitable for everything. Hortonworks is already looking beyond MapReduce and specifically is working to bring support for Message Passing Interface (MPI), used on many high performance computing systems, to Hadoop.
“MPI is the right way to do a subset of applications,” he says. “Today it is hard to run both a Hadoop cluster and an MPI cluster. Next generation will let you manage them in the same way, deploy them in the same way, and then process the data in the best way possible. By combining them in one compute framework instead of two separate frameworks, and running them with a single operations teams rather than two, it brings the costs down dramatically.”
As with all Hortonworks Hadoop iterations, this will be completely open-sourced. And while Hortonworks might look like a competitor to other Hadoop platform developers such as CloudEra, “in the end we are all in the business of improving Hadoop, and to do that we have to talk to each other,” he says. “We are very focused on working with them on developing the Hadoop core. If that doesn’t improve fast enough none of us will be in business long.”
The Hortonworks business play, Murthy says, is based on providing service rather than selling products. Companies can get the technology for free, but without deep technological knowledge and experience they are limited to running small clusters. And for the kind of massive scale data sets involved in big data analysis, “you don’t want to be running lots of 10-node clusters. You want a 1,000 node cluster.”
As you move to those larger clusters, “at the end of the day you want to call on the people with the most experience. We have a rich history of running very large Hadoop clusters from our years at Yahoo. Companies will want that as they move forward.”
Watch live video from SiliconANGLE.com on Justin.tv
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.