Beyond batch with Spark 2.0: The new continuous data application | #SparkSummit
Building the perfect data application is tricky business. Long hours are spent figuring out what data to use, wrangling and aggregating, writing code — and then new, perhaps contradictory, data arrives upsetting the model at its foundation. The fluctuating nature of data requires applications that are similarly changeable.
Michael Armbrust, software engineer and lead developer of the Spark SQL project at Databricks, Inc., said this very problem led to the development of Spark 2.0. He told John Walls and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during Spark Summit 2016 about a common problem he’d run into with customers.
“As soon as they get it working in batch mode, you immediately have the question, ‘Wait, but new data arrived. What’s the answer now?’ And typically, this was starting from scratch,” he said.
Armbrust said that batch should be looked at as a “sandbox” where you experiment and figure out what type of application you need. Then, using the exact same code, make that application streaming and continuous using Spark’s new tools. “The Spark optimizer — this thing we call Catalyst — should be able to figure out how to do that incrementalization,” he said.
The opensource win-win
Armbrust spoke enthusiastically about Databricks’ Community Edition, a new free cloud-based, big data, open-source platform. “Anybody can use this for free. You sign up. You get six gigabyte clusters. All you need is an email address,” he said.
He stated that open source has always been a core value for Spark and Databricks. He said that opening their software to the community allows users to give back by saying, “Hey, you’re missing this optimization,” and adding it. “That is the power of opensource. I think that alone is going to give us a velocity that’s hard to match in closed-source software,” he said.
Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Spark Summit 2016.
Photo by SiliconANGLE
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.