UPDATED 16:21 EDT / JUNE 06 2017

BIG DATA

What’s the big deal with event-by-event streaming in Spark’s 2.2 release?

Apache has announced that its Spark 2.2 release will finally break from near real-time data streaming into true real-time, event-by-event streaming. How is this difference of milliseconds relevant, anyway?

“Your streaming capabilities dictate the class of apps that you’re appropriate for,” George Gilbert (@ggilbert41) (pictured, right) told David Goad (pictured, left), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio. Gilbert and Goad discussed the announcement during the Spark Summit event in San Francisco, California. 

Applications were a focus during the Summit’s keynote earlier today, said Gilbert, who is also head big data and analytics researcher at Wikibon.com.

“Spark started out as … offline analytic preparation of data that was in data lakes, and it’s moving more into the mainstream of production apps,” he said, noting that event streaming helps make predictive machine learning applications possible.

Until now, Spark’s structured streaming “had to manage a cluster; it was working with a query optimizer; and so it would basically batch up events in groups that would go through, like, once every 200 milliseconds to a full second,” he said.

Spark has re-engineered structured streaming in the 2.2 release to the tune of one millisecond latency for event-by-event streaming, Gilbert explained.

Continuous coming attractions

A special onstage presentation during the keynote showed an application make predictions about cars in James Bond movies with streaming event data and machine learning.

The idea behind this is that with streaming data and machine learning in prefect parallel, apps can perform predictive analytics at a faster clip, Gilbert stated.

The implications of this will continue to be parsed throughout the summit. “The big thing is what’s the sweet spot? What type of apps? What are the edge conditions?” he asked.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s independent editorial coverage of Spark Summit 2017.

Photo: SiliconANGLE

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.