UPDATED 19:47 EDT / SEPTEMBER 26 2018

BIG DATA

Kafka alternative Apache Pulsar gains top-level project status

Apache Kafka is finally getting some serious competition.

Apache Pulsar, a distributed messaging platform originally developed at Yahoo! Inc. and open-sourced two years ago, was designated a top-level project by the Apache Software Foundation on Tuesday. The foundation bestows that designation when a technology has acquired a sufficient community of developers and users as well as a governance structure that indicates it’s mature enough to be self-sustaining.

Like Kafka, Pulsar is a scalable, low-latency messaging platform that runs on commodity hardware and provides both publish-and-subscribe and queue semantics. Publish-and-subscribe is a favored technique for building streaming data applications because it enables programs to subscribe to specific data streams and filter out a deluge of irrelevant data. Queuing delivers messages to individual subscribers only.

Yahoo! created Pulsar as a multitenant messaging system that operates at very high speed. The company said it has run Pulsar in production for more than three years, processing millions of messages per second across millions of topics for Yahoo! Mail, Yahoo! Finance, Yahoo! Sports, Flickr, the Gemini Ads Platform and the Sherpa distributed key value store.

Designation as a top-level project will give Pulsar additional momentum in recruiting developers, said Matteo Merli, co-founder of startup Streamlio Inc., which sells a real-time analytics suite that incorporates Pulsar. Merli was the original lead developer of Pulsar while at Yahoo! and was recently named vice president of the Pulsar project.

“Kafka has a much larger community; there’s no way to deny that,” he said, but he added that he expects Pulsar to quickly reach parity. “Being an incubator was a big question for most of the potential users of the project,” he said. “Now this clarifies that the project is ready for prime time.”

The Pulsar architecture separates serving and storage layers using Apache BookKeeper as the storage component, resulting in what developers call “a vastly simplified approach to cluster operations” that enables cluster sizes to be adjusted easily and failed nodes replaced without disrupting streams. Pulsar can run on everything from bare-metal machines to Kubernetes clusters both on-premises and in the cloud.

Version 2.0, which was released in May, added a schema registry for better database integration and a lightweight computing framework that enables developers to connect directly with topics, which are named channels for transmitting messages from producers to consumers. Pulsar also has a compatibility wrapper that enables it to work seamlessly with Kafka applications.

In reality, open-source projects don’t compete with each other outside of some good-natured kibitzing by developers. And though Pulsar and Kafka provide similar functionality, “the origins are very different,” Merli said. “We come from the world of database replication and highly scalable systems with strong guarantees, whereas Kafka comes from log collection.”

One area in which Pulsar had an early lead on Kafka was in “exactly once” message delivery, which ensures that every message produced at one end of a Kafka chain is successfully committed at the other. Kafka didn’t get that capability until about a year ago. A similar concept called “effectively once” that uses deduplication has been baked into Pulsar from the beginning. “It’s very lightweight with little overhead,” Merli said.

Photo: Pixabay

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.