Hortonworks finally takes data stream processing to heart
After years of presentations that focused on how to analyze, enhance and even expand views of data as it landed in the cluster, Hortonworks Inc. finally admitted that it conveniently ignored how to actually build a process that streamed in the data itself.
With the company’s announcement this week of the Streaming Analytics Manager as part of HortonWorks Data Flow 3.0, it took a major step toward giving business analysts the ability to create streaming applications without having to write a single line of code.
The new streaming data tool was demonstrated during today’s keynote at DataWorks Summit in San Jose, California, in a presentation by Joseph Witt, senior director of engineering for Hortonworks, and George Job Vetticaden, vice president of Hortonworks product management and emerging products.
“Before today, we just hand-waved at how to do stream processing,” Witt said.
The company’s SAM has changed that dynamic. In response to concerns that the process for building streaming analytics needed to become easier, Hortonworks has introduced a tool that uses a simple drag-and-drop interface to build an application in real time.
“We’ve shielded a lot of hairy details away from the developer. It’s not just easier, but quite fun,” Vetticaden said.
SAM includes a schema registry that lets applications interact with each other across streaming engines like Apache NiFi, which automates the flow of data between systems, and Apache Storm, an open-source distributed real-time computation system. In the DataWorks Summit keynote this morning, the two Hortonworks executives built a sample application that visualized data streams for a fleet of trucks, while predicting which vehicles and drivers would exceed the speed limit on a particular route.
“These are predictive analytics that work without writing any code,” Vetticaden said.
Yahoo uses Hive at massive scale
The keynote session also offered a look at how the various Apache Hadoop-based tools are being used to address critical needs in the enterprise. (Apache Hadoop is an open-source-based software used for storing, processing and analyzing big data.)
Sumeet Singh, senior director for cloud and big data platforms at Yahoo Inc., described how the company is relying on Apache Hive — a data warehouse software project built on top of Hadoop — to process half a billion records for each database query.
“Apache Hive is one of the predominant technologies that we’ve been shaping,” Singh said.
Singh said that Yahoo has introduced GPU and high-memory servers to facilitate the integration of machine learning into its operation. The company has also been running Caffe, a deep learning framework, and TensorFlowonSpark, which brings TensorFlow programs onto Apache Spark clusters, over the past two years.
“Open source is big for us,” Singh added.
The presentations from the Yahoo and Hortonworks executives underscored the growing influence of data science in the enterprise, as companies look for simplicity and a return on their information technology investment. This is leading to more focus on how to frame the big data conversation and what tools, like Hortonworks’ SAM, make the most sense.
“You don’t monetize the data,” said Bill Schmarzo, chief technology officer for the big data practice at Dell EMC, Dell Technology Inc.’s infrastructure group. “You’re going to monetize the insights that come from the data.”
Schmarzo, who spoke at the DataWorks keynote session this morning, teaches a class in Silicon Valley on how to get business people to think like data scientists. “It’s not about technology; it’s about business models,” he said.
Schmarzo challenged the gathering to better understand the economic value of data and create business models with analytics to deliver real results to the bottom line. Business executives live by the “four M’s,” which are “make me more money,” he said.
Watch the complete keynote video below, and be sure to check out more of SiliconANGLE’s and theCUBE’s independent editorial coverage of DataWorks Summit US 2017.
Photo: SiliconANGLE
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.