Drowning in data with no way forward?
Big Data is both a potential source of business advantage and of liability for companies, writes Wikibon CTO David Floyer in “The Growth and Management of Unstructured Data.” Over the next few years the volume of Big Data companies are expected to capture – including log files, unstructured office documents and audio and video such as security footage – will grow astronomically.
Extracting value from this data requires good data management first to control its growth and eliminate duplication, and secondly, to make possible the quick identification of data that is relevant to each analysis project. This, Floyer writes, requires a step-by-step process.
- Addressing Big Data management challenges
Big Data presents several challenges to management. First it is typically divided among multiple filers and systems rather than unified in a single place. Second, it lacks overall structure, and reading different kinds of Big Data requires different technologies. Searching across Big Data files to identify the subset that is useful to a specific analysis requires classification metadata that, among other things, indicates what each contains. However, most file creation technologies, Microsoft Office for instance, do not generate more than the most elemental metadata, nor do they often provide tools to allow the file’s creators to add that metadata easily.
Expert recommendations
Floyer recommends that companies create a universal method for automatically generating systems and user metadata at the time of creation. The files should be stored in a de-duplicated global file system that avoids data replication rather than in fragmented multiple systems. This file system should be integrated with modern extraction and analysis tools.
![]()
This will involve some up-front cost, but it will avoid large amounts of wasted and duplicated effort later, when the data is used. The figure above illustrates the savings that can be realized by migrating unstructured data to a global file system, based on Wikibon research.
This, Floyer writes, is a long journey that starts by quantifying the growth of different components of unstructured data, consolidating that data and eliminating redundancy, and securing it. That allows IT to develop a pragmatic plan to add structure and functionality to derive value from this data.
.
About Wikibon research
As with all Wikibon written research, this complete report is available without charge on the Wikibon Web site. IT professionals are invited to register for free membership in the Wikibon community. This allows them to influence the direction of Wikibon research and participate in that research and to post their questions, comments and relevant research on the Wikibon site.
Graphic Courtesy Wikibon.org
feature image by Musebrarian via photopin cc
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.