UPDATED 14:40 EST / NOVEMBER 12 2014

LinkedIn’s latest open-source project supercharges Hadoop

big data elephant tusks zebra stripes hybridLinkedIn Inc. is releasing yet another internally developed framework for Hadoop under an open-source license in a bid to help organizations that can’t afford hiring an army of expensive specialists to fine-tune every detail to make the most of their analytic clusters. The project adds to the already formidable pile of community contributions that the web-scale crowd has racked up over the course of its journey to push the boundaries of large-scale data processing.

Hadoop itself was borne of that endless pursuit along with many of the complementary technologies in the surrounding ecosystem, including the most recent addition, an engine called Kylin that eBay Inc. developed to spare internal users long delays when digging for data in its massive deployment. The newly revealed Cubert framework from LinkedIn extends that vision beyond queries to the full gamut of operations in Hadoop, from organizing information for analysis to carrying out the processing.

Cubert implements the lessons that the social networking powerhouse learned when laying out the foundation for its XLNT engagement testing platform, which proved too taxing for existing Hadoop sub-projects to handle. After spending several months trying to make the tools they already had at their disposal work to little avail, LinkedIn’s engineers decided to build an entirely new system to bear the brunt of the complex data manipulations in XLNT.

The technology served its purpose, but the developers found themselves having to rewrite large portions of the underlying code in order to accommodate the new use cases that the success of the project drew over time. So they set out to come up with an answer to the requirements of XLNT for the third time, and thus Cubert was born.

Tackling all 3 levels of the analytic stack

 

The framework provides an engine for finding simple solutions to complex analytical problems that might normally prove too resource-intensive to solve within an allocated time frame. It cuts across all three levels of the analytic stack.

In the storage layer, Cubert uses a combination of abstractions over the Hadoop File System to organize data as blocks structured for the most efficient access possible. These partitions are manipulated with operators located one level higher up at the execution layer that automate tasks not directly supported in other platforms, such mapping out relationships between entities and calculating statistical positions. Finally, this functionality exposed to developers through a simplified syntax dubbed Cubert Script implemented at the top of the stack that makes it possible to to specify workload execution paths without writing any Java code.

That provides a relatively straightforward interface for optimizing data processing that LinkedIn says can help users accelerate analytics by up to 60 times. Cubert only works with the default MapReduce execution engine in Hadoop on launch, but the company plans to leverage the extensibility of the framework in order to add support for the exponentially faster Spark further down the road. More analytic functions and increased automation are in the works as well.

photo credit: Camil Tulcan via photopin cc

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.