IBM commits 3,500 engineers to Apache Spark
Between the new features and integrations introduced at its third annual community meetup this morning, Apache Spark is marking a landmark new endorsement from IBM, which has decided to back the project to the tune of over 3,500 engineers who will now actively participate in the development of new functionality. The opening contribution of the initiative is a machine learning library called SystemML.
The technology is one of the latest innovations to have emerged from the company’s ongoing work on Watson, which has seen its use expand from answering trivia questions to extracting complicated patterns out of vast quantities of unstructured data over the last few years. To keep up, SystemML provides a language that directly exposes the capabilities of the artificial intelligence for data scientists to harness.
Queries written in the syntax, which is deliberately modeled after the widely-used R statistical programming framework, are automatically executed according to the most efficient mode of operation for the specific workload and operational characteristics of a Spark cluster. Needless to say, that has the potential to provide a tremendous boost for the project’s machine learning capabilities.
But SystemML still only represents tip of the iceberg for IBM’s plans. The bulk of its efforts will focus on integrating Spark into its analytics arsenal, beginning with none other than Watson. The cloud-based incarnation of the artificial intelligence that the company released for the healthcare sector earlier this year is first in line to be standardized on the framework, with other versions presumably due to follow suit later on.
At the same time, IBM is also embedding Spark into its Bluemix platform-as-a-service stack, which will make the capabilities of the framework accessible on-demand for developers and data scientists. The company hopes to bring the total number of professionals skilled in using the project to over a million within a few years through a number of education partnerships announced in conjunction, users who it hopes will tilt toward its implementation over the competition as a result.
Added up, IBM’s commitment to Spark represents the arguably biggest milestone for the project since its inception at UC Berkeley four years ago. The framework is already a fixture of the analytics discussion thanks to its speed and extensibility, but if Big Blue’s past kingmaking role in other open-source projects as Linux is anything to go by, its addition fray could take that to a whole different level.
Photo: Ariel Zambelich/Wired
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.