UPDATED 16:45 EDT / SEPTEMBER 09 2013

NEWS

R Language Tops the Charts As Most Prefered Language for Data Science and Big Data Analytics

Demonstrating the potential of Big Data technologies requires expertise from different areas. Data Science, data mining, and big data analytics are some of the expert roles that bring together the diverse skills needed to deal with big data technologies, products, and services to optimize the operations of a company. Amid those skills are the languages an analyst knows, so when KDNuggets released its survey of languages and skills, we reviewed the results.

Data visualization is an essential skill for every Web Analyst and data scientist. Data Science demands a number of additional skills, most of which are not learned in a short time. A very strong general knowledge of statistics such as Bayes, linear regression, and logarithmic regression is required, as well as knowledge of algebra and linear algebra; natural language processing; predictive analytics (based on machine learning) and most importantly, knowledge of tools such as R, Python, SQL, and other programming languages.

KDNuggets has published its annual poll of top languages for analytics, data mining and data science, and just as in the two years prior, R language is ranked as the most popular. Based on a high response of over 700 voters, R’s usage grew 16% this year compared to the 2012 poll, followed by Python, and SQL.

“The most popular languages continue to be R (used by 61% of KDnuggets readers), Python (39%), and SQL (37%). SAS is stable at around 20%. The highest growth was for Pig/Hive/Hadoop-based languages, R, and SQL, while Perl, C/C++, and Unix tools declined,” says the report.

Among the most common languages, the largest relative increases in share of usage were found among Pig Latin/Hive/other Hadoop-based languages with 19% growth, from 6.7% in 2012 to 8.0% in 2013; R with 16% growth, and SQL with 14% growth. Similarly, the languages with the largest decline in share of usage were Lisp/Clojure (77% down), Perl (50% down), Ruby (41% down), C/C++ (35% down), UNIX shell/awk/sed (25% down) and Java (22% down).

Ben Podgursky, a Software Engineer at Liveramp, shared a statistic recently, saying that ActionScript yields the highest average household income of $108,119.47, followed by XSLT ($106,199.19), Java ($103,179.39), Groovy ($102,650.86), Objective-C ($101,801.60) and ColdFusion ($101,536.70). Puppet ($87,589.29) and Haskell ($89,973.82) were at the bottom of the list in the GitHub community.

Much like Linux, R has had a rather slow but steady evolution. R was created when a couple of university professors wanted an open source system that could work on big data that was being parallel processed, and it really took off in the academic community, beginning with research projects. Today, R is being used in pre-dated parallel processing, server clusters, and Hadoop and other cloud technologies.

The mix of skills in database query languages, statistics, predictive and advanced analytics, programming, business intelligence, and cognitive science make R such a popular language among developers. Today R can scale for Hadoop execution, in-database execution, parallelized user code, parallelized algorithms, multi-core processing, multi-threaded execution, memory management and fast math libraries.

At the same time, Python has been used for building massive web applications, scientific computing, data structuring, manipulation, query, analysis, and visualization in highly quantitative domains such as finance, oil and gas, physics, and signal processing. It has powered much of Google’s internal infrastructure. According to the TIOBE Software Index, Python is the 8th most popular programing language and the third most commonly used language on the Internet’s largest code repository (GitHub), ahead of Perl, Ruby, and JavaScript.


Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.