UPDATED 11:11 EDT / NOVEMBER 04 2011

NEWS

5 Big Data Tools Built On Hadoop

Yesterday I looked at several of the alternatives to Apache Hadoop that are coming from companies like HPCC Systems, Twitter and Microsoft. These projects differentiate themselves from Hadoop by providing a more robust set of integrated tools and/or more accessible ways of performing analysis. But Hadoop has a large ecosystem, with many projects being built upon Hadoop. These projects plug many of the same holes that Hadoop alternatives try to fill.

Apache Mahout

Apache Mahout

Apache Mahout is a Java library of machine learning and data mining algorithms, many of which (but not all) are designed to run on Hadoop. The algorithms are designed to be highly scalable – a requirement doing data mining on big data sets distributed on Hadoop clusters. The algorithms are categorized into three main use cases: recommendation mining, clustering, classification and frequent itemset mining.

GoldenOrb

GoldenOrb logo

GoldenOrb is an open source graph database built on Hadoop and based on Google’s Pregel paper. It’s a fitting extension to Hadoop, since Hadoop is based on Google’s MapReduce, BigTable and Google Filesystem papers. The project is sponsored by Ravel.

A graph data base is designed to explore the network of relationships between items in a data base – like a the relationships between people in a social network, for instance. GoldenOrb is in early development now, but could eventually be used for social graph analysis, data mining, fraud detection and more.

Datameer Analytics Solution

Datameer

Datameer Analytics Solution is a business intelligence and data visualization application built on Apache Hadoop. It’s one of several products that are attempting to make Hadoop more easily accessible to non-developers (see also Karmasphere). Datameer provides wizards for setting up data integrations and a spreadsheet style interface for working with data and creating visualizations. It supports multiple Hadoop distributions, including those from Cloudera and MapR.

WibiData

I wrote about WibiData from Odiago yesterday. It’s a data management and analytics product from a new startup launched by the founder of Cloudera.

HStreaming

hstreaming

One of Hadoop’s noted weaknesses is its lack of support for real-time analytics. Hadoop is engineered to do finite batch jobs, not never ending jobs on ever changing data. HStreaming is one of a few projects that addresses this. HStreaming offers an on-premise Enterprise Edition and Cloud Edition which runs on Amazon Web Services.

Services Angle

Doing big data analysis with Hadoop doesn’t end with the . The ecosystem of tools that either build upon or extend Hadoop (such as Hive) and make it more accessible are Hadoop’s greatest strength, and something projects like HPCC Systems and Spark can’t yet match. Database, enterprise data warehouse and business intelligence companies are all tripping over themselves trying to provide integration with Hadoop, with even Microsoft and Oracle jumping in.

Next week the SiliconAngle team will be at the HadoopWorld event in New York City. It’s completely sold out, but we’ll be covering the action live on our online show theCube.


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.