5 Distinct Hadoop Deployment Patterns
In an e-mail interview this week with Forrester Senior Analyst James Kobielus, I asked about Hadoop’s real-time capabilities. The conversation turned to what he sees as five distinct Hadoop deployment patterns.
It’s a good primer for Hadoop World next week where SiliconAngle will live stream from theCube.
Here they are:
Leveraging Hadoop Proprietary Distros: Use proprietary near-real-time/real-time features of some commercial Hadoop distros (e.g, HStreaming, Outerthought, Hadapt)
Leverage Hadoop Core Sub-Projects: Use Hbase as database/storage layer for near-real-time analysis and Cassandra for real-time requirements beneath your MapReduce modeling/execution abstraction layer.
Leverage Hadoop and Other NoSQL Databases: Supplement and/or replace Hbase/Cassandra with Membase, Couchbase, or other real-time and/or in-memory databases under MapReduce.
Leverage Hadoop and Real-Time Features Commercial Enterprise Databases and Data Warehousing Platforms: Support batch or real-time features of Hadoop (open source and/or proprietary distros) with changed data capture, complex event processing, or other real-time data ingest/processing features of commercial enterprise data warehouse (EDW) such as Teradata, Oracle Exadata, IBM Smart Analytic System, EMC Greenplum Database and other commercial offerings.
Leverage Hadoop and Stand-Alone Complex Event Processing or Message Oriented Middleware: Support batch or real-time features of Hadoop (open-source and/or proprietary distros) with complex event processing and/or message oriented middleware (MOM) from IBM, SAP/Sybase, Streambase, TIBCO, etc.
Services Angle
One thing that Kobielus points out is Hadoop’s immaturity. For example, in his report: Enterprise Hadoop: The Emerging Core of Big Data, Kobielus says that among Hadoop specifications, only Cassandra offers transactional functionality to a wider range of enterprise applications above and beyond Hadoop’s core focus on advanced analytics. The proprietary vendors have added features to bring online transaction processing functionality—such as two-phase commit and rollback—to their offerings.
Kobielus says vendors are offering their own extensions such as real-time and high-availability—to address limitations of the current Apache Hadoop open-source distribution. “The Hadoop community is evolving the core codebase to address these deficiencies, but it may take several years before the open-source distribution becomes a more robust cloud analytics and transaction platform.”
The reality: Hadop is still very early in its development. But is it too slow? That’s a question we will be asking a lot next week at Hadoop World.
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.