Pentaho offers blueprint to keep data lakes from becoming swamps
Pentaho Corp. is throwing information-addled big data teams a lifeline with a blueprint that helps them create a stable and repeatable process for ingesting big data into Hadoop data lakes.
“Filling the Data Lake” is a framework and process for untangling the web of incompatible data that many big data projects must contend with. Ventana Research Inc. has estimated that organizations deploying Hadoop projects spend 46 percent of their time preparing data for analysis or reviewing the quality and consistency of data, rather than actually using it.
“It’s very easy to get raw data into Hadoop,” said Chuck Yarbrough, senior director of solutions marketing at Pentaho “The problem is when you have lots of data sets where not all files are the same.” For example, financial institutions often load thousands of CSV files that contain similar data but are formatted with different columns and metadata.
A Forrester Research Inc. Consulting report commissioned by Pentaho found that more than half of firms using Hadoop blend together 50 or more distinct data sources to enable analytics capabilities, and about one-third blend 100 or more data sources.
“When you dump data into Hadoop you don’t get a nice clean data lake; you get a data swamp,” Yarbrought said.
Pentaho says that by following its blueprint, organizations can reduce dependence on hard-coded data ingestion procedures, manage a changing array of data sources, establish repeatable processes at scale and maintain control and governance along the way.
Pentaho has created four other blueprints related to optimizing big data projects. Visitors must fill out a registration form in order to receive the information.
Photo by Ed Dunens via Flickr CC
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.