UPDATED 13:54 EDT / SEPTEMBER 09 2011

NEWS

LexisNexis Puts Its Hadoop Competitor on GitHub

HPCC Systems, the Apache Hadoop competitor developed at LexisNexis Risk Systems, just shared its source code on GitHub. The company announced in June that it would open source the project and it’s now made good on that promise. HPCC Systems released virtual machines running the HPCC platform in June, but now for the first time developers will be able to take a look at the code and customize it to their own ends.

Escalante says the HPCC team had to clean up the code to prepare it for public consumption and create a contributor agreement before the company could publish the source. He also says the company contracted both Black Duck and Palamida to audit the code to make sure everything was properly sourced and licensed.

HPCC stands for High Performance Computing Cluster. HPCC is distinguishing itself from Hadoop with its “SQLish” programming language called ECL and its near real time query system called Roxie. Wikibon’s Jeff Kelly did a comparison of Hadoop and HPC in June and concluded that companies that want to get started with big data take a look at both HPCC and Hadoop.

Armando Escalante, CTO of Risk Solutions, said at the GigaOM Structure conference that the company may start offering a data-as-a-service which will give customers access to cloud hosted HPCC clusters. He also said the company might make some of LexisNexis’ data sets available for analysis via this service. I’ve previously speculated that Microsoft is taking steps in this direction as well.

Services Angle

While I think data-as-a-service will be an important market in the future, that’s still some time off. But enterprises managing development can learn some more immediately applicable lessons from Escalante and his team’s experience of taking the product open source.

Escalante’s first piece of advise is for development teams to treat all projects as if they were open source, even if they are only used internally. Not only does this make these projects more ready to be open sourced in the future, he says, but it forces best practices that improve collaboration internally.

Escalante says the HPCC team had to make some changes to structure of the project to make it work as a GitHub project. They also had to clean up the comments, get rid of dead code that had never actually been used in the project and make various elements more consistent. He recommends writing all code comments with the assumption that eventually the public will see them, and structuring a project as if it were to land in GitHub eventually. “When you work in an open source manner, you work more efficiently, even internally, because it’s accessible to more developers,” he says.


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.