Wikibon says Hortonworks Dataflow is stream processor with a twist
Hortonworks Inc.’s DataFlow, which the company brought to market thanks to its purchase of Onyara Inc., is much more than just another stream processor. It has a unique set of capabilities that makes it hard to classify and that are the answer to the needs in the Internet-of-Things (IoT) and Internet-of-Anything (IoAT) domains, writes Wikibon Big Data Analyst George Gilbert. But Hortonworks’ obvious intent to combine DataFlow with its Hadoop distribution signals the beginning of fragmentation of the Hadoop environment. Hadoop is entering an era similar to that of the fragmented Unix environment of the 1990s.
DataFlow does the job of a stream processor but, unlike most stream processors, is bi-directional, having a separate channel to send and receive commands that control devices and applications. It’s designed to extend beyond the data center to the edge of complex networks, and it has the resilience, lineage and security capabilities of traditional databases.
These extra qualities make it ideal for IoT, which are decentralized environments. IoT will use intelligent end-point devices to gather large quantities of data. It will often use remote computing devices to capture, analyze and store data close to the point of generation rather than trying to send huge volumes through the network to a central data center. And those remote devices also need to be controlled from a central location. A smart electrical grid, for instance, not only needs to monitor the power usage of all appliances in every home, but also to adjust temperature settings when it knows the house is empty.Having two channels makes this task much simpler to accomplish.
However, the Onyara purchase is also the latest symptom of a gradual splintering of the Hadoop environment, Gilbert writes. Until recently, Hadoop vendors all provided the same open-source core capabilities and differentiated on manageability. Cloudera Inc.’s Manager and Navigator, for example, did not change the core compute engines such as MapReduce, Hive and Pig. Cloudera ships its own analytic MPP SQL database, Impala, but this uses the standard Parquet data format and Hive HCatalog, so data is not locked in.
The fast growth of the Hadoop market, however, is beginning to splinter the community, Gilbert writes. Hortonworks has always been strongly committed to using Apache projects for core compute engines and management tools. With DataFlow, stream processing, which, Gilbert says, is becoming a core compute engine, may be different across different vendors.
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.