Cloudera: Impala is faster than Hive, and here are the numbers to prove it
Cloudera is on a journey to make Hadoop enterprise-ready, and central to that mission is the integration of enterprise capabilities into the Big Data stack. That includes high-availability, recoverability and business intelligent features like structured query, a gap that the company is working to bridge with Impala. Launched in October 2012 and released for general availability in May 2013, the open source SQL-on-Hadoop solution is considerably faster than Hive, and according to a new internal benchmark, a leading DBMS as well.
In the first part of the the benchmark test, Cloudera pit Impala 1.1.1 against the latest release of the data warehouse, which runs on YARN, in a 3TB environment consisting of five Hadoop nodes with a 8-core processor and 96GB memory each. Impala outperformed Hive by between 6 to 69 times across three categories, namely interactive query, reporting, and deep analysis.
Afterwards, Impala 1.2.2 and a parallel relational database, only referred to as DBMS-Y due to “restrictive proprietary licensing agreement” terms, were run against 30 terabytes of TPC-DS data on a 20-node cluster. The SQL engine emerged victorious once again, outperforming the commercial database by an average of two times but coming in behind on 3 of the 20 queries tested.
“Interactive exploratory business intelligence is a mainstay workload of the Enterprise Data Hub,” noted Mike Olson, the founder and chairman of Cloudera. “One year ago, when we released Impala to open source, we knew that it had the potential to eventually play on the same field as some very mature analytic DBMSs, but the results of these performance benchmark tests exceed our very high expectations.”
Noticeably absent from the study are rivaling SQL-on-Hadoop solutions like Hadapt and the Hortonworks-sponsored Stinger project, but Cloudera maintains that Impala is the “fastest, most functional and proven way to run SQL on Hadoop data,” with more than 5,000 corporate users in various industries.
image source Cloudera
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.