UPDATED 09:08 EST / DECEMBER 19 2014

Google opens cloud-based Hadoop alternative to developers with free SDK

blue sky and clouds as worlwide mapGoogle on Thursday released a software development kit (SDK) for its cloud-based data crunching engine in an effort to the development of analytic applications against the service. The launch comes seven months after the search giant first unveiled its ambitious plans to steal Hadoop’s thunder.

Currently available in limited access, Cloud Dataflow is an evolution of existing analytic technologies in the open-source ecosystem that aims to eliminate most of the hassle required to process large quantities of unstructured information coming from multiple sources. The platform accomplishes that with a unified programming interface that makes it possible to handle static batches of historical data and real-time streams under the same coding layer.

Cloud Dataflow abstracts the nuances of different information types into consistent “PCollections” that can pull updates from a specified source or perform any number of other tasks. Developers can manipulate these adaptive datasets using a built-in library of operations covering many of the functions available for Hadoop and then some.

That syntax is executed in a way that makes it possible to efficiently reuse code for multiple workloads, which saves time and effort while enabling the underlying runtime to collapse repeating actions for faster execution. Cloud Dataflow also incorporates performance analysis metrics, system monitoring functionality and other operational capabilities of Google’s infrastructure-as-a-service to automate many of the management details.

The search giant’s vision for the platform contrasts sharply with Hadoop, which is a collection of independently-maintained and often overlapping open-source projects. Deploying Hadoop has become more feasible in recent years thanks to the emergence of on-demand hosting options, but processing real-time and historical data on the same cluster can still require stringing together multiple tools, each with its own architecture and syntax. As a result, many IT organizations have struggled to fulfill the full potential of Hadoop.

Cloud Dataflow aims to make that functionality available for everyday developers with a simple interface and a utility pricing scheme. The new SDK, which is available under an open-source license, makes it possible to harness the service for next-generation analytic applications and import data from existing Hadoop environments.

The platform only works with Java for now, but Google plans to add support for Python and other programming languages. The search giant also allows developers to extend the native syntax of Cloud Dataflow themselves with custom operators for automating complex transformations.


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.