UPDATED 06:02 EST / NOVEMBER 10 2011

Digital Reasoning Focuses on Pattern Recognition in Unstructured Data on Hadoop

Cloudera user Digital Reasoning is focused on developing ontologies for unstructured data, basically finding the patterns that allow that data to be analyzed, company CEO Tim Estes told SiliconAngle CEO and Founder John Furrier and Wikibon.org Chief Analyst David Vellante in an interview webcast live from HadoopWorld 2011 on SiliconAngle.tv.

The company, which has about 30 employees, started as a defense intelligence contractor and now is expanding into business analytics. Under the covers, he said, its technology is basically a clustering algorithm that establishes a context for a specific piece of data of interest. So for instance, it will look at the context in which a particular word is used by examining similarities that it then puts into a hierarchy. This creates specific blocks out of a mass of unstructured data that then can be counted and used as the basis for analysis. So if the word is “toothbrush” it might look at how often that word is used with the word “morning” or “toothpaste” to establish patterns of when or with what a toothbrush might be used.

This, said Digital Reasoning President and COO Rob Metcalf, can be applied to finding patterns that can be useful to different kinds of businesses. “Customers have large amounts of clustered data, and they are trying to identify actors.” In government that might mean identifying potential bad actors based on word patterns in their communications. Financial traders might look for patterns that indicate when a stock or currency is likely to rise or fall in value. Law firms might use the technology to determine who knew what facts when by examining emails and other written communications. In public health it could be used to identify patterns of disease by examining huge numbers of medical records. This could be helpful, for instance, in identifying disease outbreaks early or in planning staffing for a hospital.

So far Digital Reasoning has been focused on developing the core technology. “The year ahead of us is the year of developing applications,” Estes said. But they already are seeing a strong interest in the technology, and they expect to have no problem finding ways to apply it to business needs.

“Basically the problem is that with the explosion of data we can no longer afford to apply humans to understanding all the material involved,” he said. “A decade ago you could search and read the top 1% of results on a subject. Now you may be able to look at the top 0.5% or 0.1% of the references a simple Google search may turn up. It gets really bad when you can only look at the top 0.01%. And the result is you have no confidence that the conclusions you reach are legitimate. We let you apply machine intelligence to analyze a very large amounts of data to produce much more accurate conclusions.”


Watch live video from SiliconANGLE.com on Justin.tv


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.