UPDATED 11:39 EST / NOVEMBER 16 2010

A Tutorial for Hadoop and Map Reduce in Java

Hadoop has become an extremely big name here at SiliconANGLE, being one of the premiere open source cloud-storage and -computing projects. If you’re a Java developer and you haven’t had a chance to take a test drive with it, there’s a very easy tutorial up by Carlo Scarioni covering Hadoop basics.

Hadoop is an open source project for processing large datasets in parallel with the use of low level commodity machines.
Hadoop is build on two main parts: a special file system called Hadoop Distributed File System (HDFS) and the Map Reduce Framework.
The HDFS File System is an optimized file system for distributed processing of very large datasets on commodity hardware.
The Map Reduce framework works in two main phases to process the data. Which are the Map phase and the Reduce phase.

The tutorial shows a developer where to download the source files from Apache, how to unpack the helper executables, and provides a small set of Java code.

The code implements a dictionary translation by taking a series of compiled dictionaries (English-Spanish, English-Italian, English-French) and then outputs a single dictionary that displays the English word followed by every translation. Under normal circumstances, the could would start with an English word and then search every file for each instance. Hadoop speeds this up by distributing the file and processing.

The code uses a cloud-storage mechanism to speed up the hash mapping of the various dictionaries, but it does not use cloud-processing to accelerate itself. Since this is only a basic tutorial series, Carlo mentions that he’ll hit that up later.

So, if you know Java and want to play around with Hadoop, here is an excellent place to begin.

Also, it’s a good way to get an understanding of how this framework can give you a jumpstart on the cloud computing revolution.


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.