UPDATED 07:00 EST / FEBRUARY 06 2018

BIG DATA

Podium Data takes its data lake catalog to the cloud

With the new 3.2 release of its Data Marketplace, Podium Data Inc. is taking its first steps outside the on-premises world and bringing self-service big data to the cloud.

The Data Marketplace is essentially a data catalog for use with data lakes that eliminates the need for the extensive extraction and massaging procedures that characterize pure-Hadoop models. Podium promotes the software as providing self-service, on-demand access to quality data.

It uses a proprietary data loader to pull information quickly from internal systems, including notoriously difficult-to-access platforms such as mainframes. The information is then converted into a standard format that business users can access.

The architecture requires no clusterside installation and so works with the most popular big data platforms. With the 3.2 release, users can now combine on-premises and cloud data in any combination, the company said. Podium architecture separates storage from computing to enable data of the data delivery teams to support multiple variations of an analytical application from a single store. With version 3.2, sources now include Amazon Web Services Inc. and Microsoft Corp. Azure clouds.

“When you look at how data gets into the hands of business people, the traditional data supply chain was heavy on data engineering and programming,” said Chief Executive Paul Barth. “This is a turnkey solution that lets users search for data, put it in shopping carts, combine it and compare it.”

Using a metadata-driven catalog enables the repository to “know what data has been used and the production processes, and it gets smarter over time,” Barth said. Machine learning works on the supply side to learn about data quality and governance standards during the load process.

“When we ingest data into the Marketplace, we build out metadata about that information, pull out dirty data records and set up access control policies,” he said. The platform uses parallel-processing Hadoop engines and a patent-pending algorithm that “looks at every byte and compares it to any technical constraints customers have defined about what is an acceptable record.”

Nonconforming “ugly records” are set aside and kept out of the production data set. Podium claims its platform accelerates delivery of new data to business users up to 25-fold and reduces data delivery costs by 40 percent. Version 3.2 permits assets inside and outside the cloud to be merged and joined. Barth said elasticity features support hundreds of users and concurrent workloads.

“All of that can be done using our single-node application without having to spin up a new cluster,” he said. “We can right-size the cluster for the size of the load that’s running.”

Pricing was not released. Founded in 2014, Podium Data has raised nearly $12 million in funding.

Image: Podium Data

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.