UPDATED 13:40 EST / FEBRUARY 29 2012

Strata Conference Day 1: Data Mining and Predictive Models

One of the biggest conference dedicated to big data kicked off yesterday, and we at SiliconAngle are here to give you the highlights of the events, drumming up highlights and exclusive interviews. Last year’s conference heavily focused on Hadoop, and it’s still a central topic this year, compounded by a lot of data mining and predictive analytics.

John Furrier and Dave Vellante brought in sociologist Marc Smith from Social Media Research Foundation for an interview at theCube. He believes that now’s the perfect time to construct tools to edify the mass about big data and social data, saying it is the leg up to our knowledge of the “big picture.”

Whoever “races to the top of the Big Data mountain first will see that vista,” and this is to their advantage as they’ll be the first to play on, develop and utilize big data patterns. It enables social scientists understand society.

To support Smith’s claim, here’s Christopher Berry’s take on the  Strata Conference Day 1 in a few bullets:

On Web Mining

• The web is an infinite series of edge cases.
• The Robots.txt is not a terms of use document.
• Scraping should be done ethically, respect the robots.txt, respect their rate limits, be transparent about who you are and why you’re taking data from them.

Predictive Models

• The complexity of the model needs only to be proportional to the complexity of the problem.
• Producing random trees and generating a forest is a good way to produce a model without systematic error.
• Keep the trees shallow.

Aside from big data talk, Strata Conference is also a hub for product launches. To start, we have the Cloudera University announcement.  Their Shared Learning Collaborative, an effort to make data work for students, received a lot support. This open source project is  geared towards building technology that helps bring personalized educational materials and powerful tools directly to teachers’ fingertips so that they can easily find the resources, techniques and strategies that will help them meet individual students’ learning needs.

Here’s also some interesting statistics originally drummed up by The Guardian about the conference’s attendees. The highlights are as follows:

-Developers traveled an average of 2,346 miles to attend the conference.
-The total company air miles: 2,174,144
-About 83% of attendees are male. Ugh.
-About 33% of the attendees said their organization stores an average of one terabyte of data produced each.


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.