UPDATED 16:20 EDT / JULY 25 2017

BIG DATA

Could Apache Spark become a universal computation engine?

Spark Summit keynotes are known for their surprises, and this year the stand-out changes were in data streaming, with sub-millisecond times predicted for some workloads. With multiple avenues open for potential success, the community is watching as Spark matures to fulfill the promise of what it could be: But does that promise include becoming a database?

Exploring the gap between theoretical possibilities and reality, Matthew Hunt (pictured) technologist at Bloomberg LP, discussed the maturation of Spark with George Gilbert (@ggilbert41) and David Goad (@davidgoad), co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during this year’s Spark Summit event in San Francisco, California.

As a pioneer of streaming media, Bloomberg has a long history developing apps for news and finance and has developed its own relational database, ComDB2. “Everyone needs a database,” Hunt said, adding that most companies do not have the resources to develop their own. This leads to the question: Can Spark become a database?

Hunt believes that Spark has the promise to become a Universal Computation Engine. Describing a universal system as having distributed file store, database with transactional semantics, extensible analytics and the ability to stream data in, he asked, “how close can you come to that?”

Maturity means real-world use

Although the dream might be a universal system, the more practical question is how to make Spark and other databases work well together.

“If you have to master 5,000 skills and 200 different products, that’s a huge impediment for real-world usage,” said Hunt, who sees practical usage coalescing around a smaller set of options.

Hunt predicted that Apache Arrow, which powers columnar in-memory analytics, is about to explode because “it lets you connect these systems radically more efficiently in a standardized way.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017(* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Video by SiliconANGLE

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.