UPDATED 09:50 EDT / JULY 25 2016

NEWS

Wikibon says It’s too early for Big Data performance benchmarks

Several Wikibon clients have asked about performance benchmarks for Big Data systems. The problem, writes Wikibon Big Data and Analytics Analyst George Gilbert, is that the technology is too immature. Meaningful benchmarks are based on standard workloads, but “common Big Data workload remains an alien concept” in Big Data, the analyst writes. Yahoo! Inc., for instance, created the Yahoo! Cloud Serving Benchmark (YCSB) to benchmark key-value scale-out NoSQL databases in 2010. That standard seems to be losing favor, however, because these databases are deployed in widely varying scenarios that are not covered.

Apache Spark is particularly difficult to benchmark because each new point release enables new classes of complex workloads, “which are neither cheap nor easy to translate into benchmarks,” Gilbert contends. The most commonly used benchmark for Big Data systems today is the Transaction Processing Council’s TPC-DS 2.x, which is designed to benchmark SQL decision support and can be targeted at Hadoop. However, the products being tested are so immature that “none that we know of actually can run all 90+ queries in the TPC-DS test suite unmodified.”

In general, attempts at using existing standard benchmarks on Big Data workloads have been ineffective, and the product benchmarks that have been published typically aren’t useful. And since the technology has not yet matured to the point of having standard workloads, customers find that the benchmarks that are published often do not apply to their situations.

The full report, which is available to Wikibon Premium subscribers, looks at published benchmarks for several prominent products and points out the serious weaknesses in each. Gilbert recommends that users who need benchmarks should run their own stylized workloads based on their intended usage scenarios and not expect those to compare closely to the workloads or experience of other users.

photo credit: Amulet (Nazar) via photopin (license)

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.