UPDATED 14:02 EDT / NOVEMBER 03 2011

spark by Jonas Maaløe Jespersen NEWS

Another Hadoop Alternative: Spark

spark by Jonas Maaløe Jespersen I just published a list of Apache Hadoop alternatives, but here’s another one for the list: Spark. Spark is an distributed in-memory data analytics platform that uses the Scala programming language. IBM claims that Spark should be must faster than Hadoop because it uses in-memory analytics instead of Hadoop’s cluster file system approach. Spark was developed at the UC Berkeley AMP Lab along with Mesos, which is now an Apache Incubator project.

According to a recent paper on Spark from IBM:

Spark is an open source cluster computing environment similar to Hadoop, but it has some useful differences that make it superior in certain workloads—namely, Spark enables in-memory distributed datasets that optimize iterative workloads in addition to interactive queries.

Spark is implemented in the Scala language and uses Scala as its application framework. Unlike Hadoop, Spark and Scala create a tight integration, where Scala can easily manipulate distributed datasets as locally collective objects.

Although Spark was created to support iterative jobs on distributed datasets, it’s actually complementary to Hadoop and can run side by side over the Hadoop file system. This behavior is supported through a third-party clustering framework called Mesos. Spark was developed at the University of California, Berkeley, Algorithms, Machines, and People Lab to build large-scale and low-latency data analytics applications.

Spark is currently in use at Conviva.

Services Angle

Spark is a fresh approach that demonstrates that Hadoop isn’t necessarily the end-all-be-all of big data analytics. There’s quite a bit of room for improvement on Hadoop’s model, whether that’s through Hadoop distributions that add tools to the Hadoop stack or through alternatives like Spark and the others I’ve written about. Most of these tools don’t have the traction that Hadoop has yet, but the market is still open.

Photo by Jonas Maaløe Jespersen


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.