UPDATED 10:41 EDT / MARCH 12 2013

Quancast’s Gift to Open Source: Room to Grow Big Data

Quantcast, by its own admission, has been dealing in Big Data since 2006 — before it was cool. Jim Kelly, VP Research and Development at Quantcast stopped by theCube during Strata last month to give some background on the Quantcast File System (QFS). As an alternative to HTFS and free to the open source community, Quantcast hopes to deliver better cost efficiencies at large scale to anyone who adopts it.

QFS started 5-6 years ago when Quantcast began innovating a lot of technologies internally to handle the volume they were getting.  Released in September 2012, it is a direct alternative to HTFS. A problem QFS is trying to fix is that Big Data sets tend to grow and have high operating costs. Power computing can quickly become a six- to seven-figure monthly operating expense.  So with QFS, a goal was to build a more efficient file system that makes better use of space.

QFS effectively doubles storage capacity of a Hadoop cluster compared to stock HTFS.

The #1 challenge in designing a distributive file system is fault tolerance. Software needs to tolerate bits of your data going missing. HTFS makes 3 copies. QFS uses read Reed-Solomon Encoding (same used in CDs, DVDs). Big space savings, 1.5 copies, so relative to HTFS it’s half.

QFS uses data slices and parody slices (six data slices and three parody slices) to nine separate places by default. If QFS can read any six, it can reconstruct the data. HTFS you can only lose two, thus QFS has a better fault tolerance too.

Here are some interesting factoids that show host Dave Vellante got Kelly to confirm during the interview as far as Quantcast numbers:

  • 50 terabytes of data in the door per day
  • avg. day process over 20 petabytes
  • 1000 machines (reasonably modest commodity hardware)
While he remained vague, Kelly said that Quantcast would measure success by the number of high quality collaborators that help extend the product together. File systems are an especially critical piece of the infrastructure puzzle. QFS stands to benefit from the scrutiny of open source, and Hadoop will benefit from having a file system that runs its framework. The giveback of QFS to open source is a win-win for all.

See Kelly’s full segment below.

http://youtu.be/3fXArMUBrrQ

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.