Collaborating to drive data cataloging | #BigDataNYC
The exponential growth of data by volume and type makes it necessary to provide referential resources for collaboration among enterprise users, and one team up is taking on the challenge. With Q4 plans to release a new connectivity layer that catalogs queries from popular compute engines like SparkSQL and IBM Watson DataWorks, Alation, Inc. has caught the eye of Teradata Corp. for a re-sell partnership.
Stephanie McReynolds, VP of Marketing at Alation, and Mark Shainman, marketing director at Teradata, joined Dave Vellante (@dvellante) and Peter Burris (@plburris), cohosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016 to discuss their partnership, how Data Catalog works for customers and how to handle big data.
Do you have a data lake or a data swamp?
Vellante brought up the point that while there is much complaining about Hadoop, including its data lake concept, it did get the data to where it needed to be. How companies deal with that data after collecting it is the issue, and that’s where Alation and Teradata come into play.
“Is it a data lake or a data swamp? … Different organizations are [all] at different phases of figuring out the data lake … [but they all] need governance,” said McReynolds. The more users that come into the lake, if there’s no way for them to see what’s already in the lake and what the quality of that information is, that data, so carefully collected, can be useless. So it’s necessary to have “a catalog that reads and interprets data … as we get more people running queries … we need something like a data catalog to see and understand what’s in there,” continued McReynolds.
Presto (an open source SQL query engine that Facebook developed) was designed and written for interactive analytics and approaches the speed of commercial data warehouses, while scaling to the size of organizations. “[Presto was built by Facebook], then they open-sourced it. [Teradata] is a major contributor to the code base,” said Shainman. Teradata sees Presto as filling a specific niche, primarily running interactive queries against large sets of data with low latency and many users.
Handing Big Data
The discussion moved to Teradata’s play in Big Data. Vellante asked, “What’s the most important part of your Big Data?”
Shainman answered: “Hadoop and Big Data are all synergistic to the data warehouse … [we realize] that multiple platforms are going to exist in one organization. … We’ve moved away from this silo[ed] set up … Alation brings in the governance and cataloging.”
Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of BigDataNYC 2016.
Photo by SiliconANGLE
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.