LinkedIn releases its internal Hadoop optimization tool
Less than two weeks after open-sourcing its internal application testing system, LinkedIn Inc. is expanding its open-source software portfolio once again with the release of Dr. Elephant, a performance optimization engine designed to help speed up Hadoop queries. The software was developed to spare the social networking giant’s data science team from having to manually instruct analysts on how to fine-tune their workflows.
The chore took too much time away from the unit’s other activities due to the fact that many of the employees on who rely on LinkedIn’s internal analytics environment, which includes both Hadoop and Spark, aren’t particularly familiar with the cluster’s inner workings. Having a query execute at optimal speed is difficult even if one does possess a thorough understanding of the frameworks since performance is influenced by numerous different configuration settings that each must be tweaked individually. And to make matters even more complicated, many of those settings are also interdependent, which means that setting a parameter to the wrong value can potentially send a user straight back to square one.
Dr. Elephant promises to do away with much of that hassle by automatically analyzing the operations logs from an analytics cluster to identify why queries aren’t running as fast as they should. Its findings are displayed in a visual dashboard that enables users to see how performance fluctuates over the course of a given workflow’s execution and compare the speed with previous runs. According to LinkedIn, the functionality allows analysts to quickly tweak their jobs until finding the right fix.
The company claims that Dr. Elephant can thereby help resolve 80 percent of the optimization issues that crop up during day-to-day analytics work. That adds up to a lot of saved time for LinkedIn’s data science team across the roughly 10,000 Hadoop and Spark jobs that employees run every day, a benefit other organizations are now able to exploit as well. The engine’s new open-source status means allows for its functionality to be customized according to the specific requirements of the analytics cluster in which it’s deployed, a major boon for potential adopters.
Image via Geralt
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.