UPDATED 11:22 EST / MARCH 05 2012

Kaggle Sees Data Science as a Sport

With the tagline, “Data science as a sport,” Kaggle helps companies and government agencies, including NASA and Allstate Insurance, develop big data predictive analytical algorithms for a wide variety of data-dependent questions by creating contests and enlisting large numbers of independent data scientists worldwide to post entries.

Kaggle was inspired by the Netflix Prize, company President and Chief Scientist Jeremy Howard told Wikibon’s Jeff Kelly on a live webcast interview in The Cube at Strata 2012 (full video below). This was a $1 million prize created by Netflix for the best solution to improving its recommendation system that attracted about 50,000 entries and was mentioned several times in the N.Y. Times. The winning solution improved Netflix recommendation accuracy 300%.

“We realized that this was actually a great way to design predictive modeling for all kinds of problems in science, industry, and government,” Howard said. “So we created a site that helps organizations design and run their own predictive modeling competitions. Rather than having a team of experts spend a year setting up your predictive modeling system, you just fill out a five-step wizard and create your competition.”

In an environment where demand runs very high for the comparatively few data scientists available, this makes predictive modeling available to organizations without the skills in-house. But Kaggle’s clients are not limited to those organizations.

For instance, its internal data scientists worked with domain experts at NASA to design a competition for the optimal method to predict identify of dark matter in the Universe from the huge amounts of raw data NASA has captured. The result of the competition was a methodology that improved dark matter mapping by three times. NASA has some of the best data scientists in the world, but they could not do what the 30,000 PhD-level data scientists that participate in Kaggle competitions accomplished in a few weeks.

Allstate Insurance, which has its own internal team of the world’s best actuaries, had Kaggle run a contest to find a better way to predict which drivers would be most likely to crash their car. The contest yielded a method that was three times as accurate as what Allstate had developed internally.

Kaggle contest winners have four primary characteristics, Howard said: openmindedness to new and sometimes oddball ideas, creativity and curiosity about what others are doing in the field, tenacity to stick to the problem even when someone else is ahead in the contest, and top data science skills.

He said that the attraction of this methodology is that it creates “a meritocracy outside the world of sports.

“We all believe in using data, rather than where someone went to school, who talks the loudest, or someone’s title in the organization, to drive decisions. It is a meritocracy in which the person or team with the best solution wins.”


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.