UPDATED 13:30 EDT / APRIL 17 2017

BIG DATA

How Uber cruised Apache data frameworks and arrived at Flink

It’s hard to overstate how heavily ride-sharing apps rely on fresh data streams to do business. So what makes a streaming data framework good enough for Uber?

Uber has run the gamut from Apache Spark to Apache Samza to Flink, most recently, said Chinmay Soman (pictured), staff software engineer at Uber Technologies Inc.

“We observed that the Spark stream processing is actually more resource-intensive than some of the other technologies we benchmarked,” he said.

This intensive use of compute power and memory spurred Uber to look for a more efficient solution, Soman told George Gilbert (@ggilbert41), host of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during the Flink Forward event in San Francisco, California, last week.  (*Disclosure below.)

At his previous LinkedIn gig, Soman gained experience with Apache Samza, so he welcomed Uber’s transition to this framework. However, “We hit a scale where Samza, we felt, was lacking,” he said. The reason is that Samza frequently ties into Apache Kafka, a tool for building real-time data pipelines and streaming apps.

With multi-stage pipelines where one stage processes data and sends it to another stage, all intermediate stages go back to Kafka, which is expensive and complex, Soman explained.

“So if you want to do a lot of these use cases, you actually end up creating a lot of Kafka topics and the I/O [input/output] overhead on a cluster shoots up exponentially,” he said. “So that’s why we are looking at Flink. In Flink, you can actually build a multi-stage pipeline and have in-memory cues instead of writing back to Kafka, so it is fast and […] you don’t have to create multiple topics per pipeline.”

Fast recovery from backpressure is another perk of Flink, as is local offloading of data from memory to disk aided by a RocksDB cache, an embeddable persistent key-value store for fast storage, Soman said.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Flink Forward 2017. (*Disclosure: TheCUBE is a paid media partner at Flink Forward. The conference sponsor, data Artisans, does not have editorial oversight of content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.