Spark doubles down on streaming, data warehousing and deep learning - SiliconANGLE

UPDATED 22:30 EDT / JULY 19 2017

BIG DATA

Spark doubles down on streaming, data warehousing and deep learning

by Mark Albertson

The Apache Spark community has been wrestling with a wide range of big data challenges in information technology, and Databricks Inc. (which was founded by Spark’s creators), is taking steps to address the enterprise need for machine learning and speedier data processing.

“Rather than giving people the fish, you give them the tools to fish,” said Reynold Xin (pictured), chief architect and co-founder at Databricks.

Xin stopped by theCUBE, SiliconANGLE’s mobile livestreaming studio, and answered questions from hosts David Goad (@davidgoad) and George Gilbert (@ggilbert41), during this year’s Spark Summit 2017 in San Francisco, California. They discussed changes for the Spark platform, the role of storage systems in analytics and the next big challenge for the Spark community. (* Disclosure below.)

Deep learning is a priority

One of the tools announced by Databricks during the Spark Summit was Deep Learning Pipelines, an open-source library designed to give users the ability to create neural networks for data processing. “We’re hoping to democratize deep learning,” Xin said.

Seeking to dramatically speed-up data processing, Databricks has also blended a Structured Streaming tool into its enterprise portfolio. Databricks customers processed 3 trillion records last month using Structured Streaming and brought latency down to the three millisecond range, according to Xin.

Databricks is also working to improve the visibility and what Xin termed “debug-ability” of big data jobs. By improving the performance and capability of data warehousing features in Spark, this will also increase job processing speed, he added.

While storage systems have “matured” and Spark can work effectively with a variety of them, Xin is not prepared to include storage in an analytical role just yet. “It doesn’t make sense to build storage systems for analytics at this point,” said the Databricks co-founder.

Despite the release of new enhancements, the challenge for the Spark community will continue to be finding ways to make data management and deep learning tools easier to use. “The bar to entry is very high for these tools. It’s what we focus on a lot at Databricks,” Xin concluded.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017. (* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.