UPDATED 09:00 EST / NOVEMBER 27 2018

BIG DATA

Okera brings intelligent schema management to S3 data lakes

Okera Inc., a startup founded by two former Cloudera Inc. executives to simplify the management of large heterogeneous data stores at scale, today is introducing a schema management tool designed to make it easier for companies to find, access and structure data from popular data analytics tools running on top of Amazon Web Services Inc.’s S3 cloud storage service.

The company, which launched out of stealth mode in May with $14.6 million in venture financing, specializes in data governance for data lakes, which are collections of largely unstructured data that aren’t organized according to a schema, which is a visual representation of the relationship between tables in a database. Schemas are typically applied to structured data prior to being used in production, but unstructured data can defy such rigid classification.

“All of the functionality that we’ve become used to in the world of relational databases has been missing from data lakes,” said Okera CEO Amandeep Khurana. ”We’re bringing that functionality.”

The new release of Okera’s Active Data Access Platform features what the company calls “intelligent schema management,” which it says enables data administrators to automatically discover new data sets, infer their schemas and assign universal access permissions at a fine-grained level.

It also features a new file system manager that the company said streamlines the discovery, access, governance and use of unstructured data in S3 data stores. Supported analytics platforms include Amazon’s Elastic Map Reduce, Apache Hive, Apache Presto, Apache Spark and business intelligence software from Tableau Software Inc., Birst Inc. and Qlik Inc.

The platform is similar to a data catalog in that it enables data to be registered and governed according to an assigned set of metadata. However, “Most catalogs focus on business metadata. We are the technical and operational metadata,” Khurana said. “With schema ingest, we’re making life easier for the data producer who’s on-boarding the data set.”

Data lakes have been plagued by a lack of tools to provide structure and access control, both of which are essential to performing reliable analysis without risking inadvertent disclosure.

Okera says its platform not only enables administrators keep track of all their data in one place but also enforce access rules down to the field level. Okera says it can automate these administrative procedures at scale, and that it is already managing multipetabyte data lakes for customers.

Pricing is based on usage, but Okera didn’t provide details.

Photo: Unsplash

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.