AI powers the catalogs of next-generation big data
Data’s relevance doesn’t always jump out at you. It takes work to distill useful insights from enterprise data lakes that are increasingly too large, diverse and dynamic to be explored through entirely manual methods.
Discoverability and visibility are what unlocks data’s value. More enterprises are embracing big-data catalogs to harness insights that would otherwise stay dormant and overlooked. Recognizing this growing demand, more data management solution providers are building sophisticated catalogs into their solution portfolios, as discussed in Wikibon’s recent big-data market study.
Artificial intelligence is a key force driving the evolution of big-data catalogs into enterprisewide platforms for collaboration curation. Increasingly, providers are integrating AI into their offerings to help users discover, refine, explore, analyze and apply complex data sets more rapidly and intelligently to diverse applications.
Among data management vendors, Informatica LLC has set the pace in the weaving of AI-infused metadata-management capabilities into its solution portfolio. In the breadth and sophistication of its AI capabilities, Informatica stands apart from other data catalog solution providers such as Alation Inc., Cloudera Inc., Hortonworks Inc. and Microsoft Corp.
The company briefed Wikibon last summer on its roadmap to integrate AI as an enabling capability across its entire product line, with its Enterprise Data Catalog at the center. At that time, Informatica had already incorporated AI — which it brands as “CLAIRE” — into its catalog to automate data clustering, tagging, and domain/entity recognition. The AI-powered catalog intelligently scans data assets from across the enterprise and automatically adds business context metadata. In its data integration offerings, Informatica had already integrated such CLAIRE AI technologies as genetic algorithms (to identify complex data sub-structures), natural language processing algorithms (to drive semantics-based modifications to data models) and machine learning algorithms (to parse clickstream, log, system, JSON and other “internet of things” data).
At Informatica World 2017, CEO Anil Chakravarthy spoke to theCUBE about how CLAIRE figures into its product roadmap going forward. “When we built CLAIRE, “ he said, “we did not invent the artificial intelligence or the machine learning. A lot of that is already available. So we took a lot of the best algorithms in machine learning and applied them to metadata and data management. That’s the secret sauce. It’s not the building the AI itself, it’s the use of the AI for data management.”
Chakravarthy emphasized that CLAIRE is “not a product. It’s … a cloud-scale, AI-powered real time engine that powers other products.” He added that CLAIRE will be embedded in Informatica products so that customers won’t have to deploy it explicitly. “So it means once you have any product like our enterprise data catalog or data governance solutions, you’re starting to use CLAIRE and then you can use CLAIRE for other use cases as well.”
In a new product announcement today, Informatica rolled out new features that infuse CLAIRE’s AI smarts more deeply into the catalog at the heart of its solution portfolio. The company’s core announcements were twofold: It has introduced enhanced AI algorithms for improved curation and classification of structured and unstructured data, and it now provides an integrated metadata-driven intelligent API.
These new features support self-service discovery of the catalogued data that is best for the task at hand, such as training a machine learning model or curating customer datasets. They also enable users, such as data scientists and stewards, to apply the catalogued data via a single click to whatever application environment they’re working within. In addition, Informatica now provides single-click deployment of the catalog to the Amazon Web Services and Microsoft Azure, so all of these features are available within those public clouds.
Over the next several years, Wikibon expects to see big-data catalogs become ubiquitous in enterprise data environments, with AI, intelligent metadata, recommendation engines and automated task-specific guidance as essential features. These capabilities will help organizations to manage their growing information assets across more complex hybrid clouds.
Image: stux/Pixabay
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.