UPDATED 15:00 EDT / MAY 25 2018

INFRA

Analysis by paralysis: Why data should stay put in AI infrastructure

Artificial intelligence and machine learning software geared for data analysis is flooding the market, and companies are buying. But a close examination reveals that the vast majority of big-data projects implode and most data scientists spend little time actually doing analytics. What are all of these people doing wrong?

“The bigger movement here is that recent advances in technology have really rehighlighted a focus on organizations getting more out of their data of all forms,” said Rob Lee (pictured), vice president and chief architect of Pure Storage Inc.

Lee spoke with Dave Vellante (@dvellante) and Lisa Martin (@LuccaZara), co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the Pure Storage Accelerate event in San Francisco. They discussed the holdups in big data and AI and how companies can bust through them(* Disclosure below.)

The wide availability of advanced algorithms is democratizing AI for all businesses, on the one hand, Lee pointed out. Conversely, there are two factors that will separate the sprinters from the hobblers in the race to big data and success. One is the sheer wealth of data in their possession, he said, pointing to Google Inc. as the obvious proof point.

“The takeaway point there is having a lot of data trumps having the best algorithm, and we expect that to continue as AI research and algorithms continue to evolve,” Lee stated.

The second is infrastructure that ties AI algorithms and applications together to deliver insight or action some time before next Christmas.

What type of infrastructure should companies serious about AI be looking at? “It’s all about simplicity; it’s all about removing friction and bottlenecks,” Lee said. A commonly cited statistic is that data scientists spend 80 percent of their time wrangling, funneling and transporting data in various forms — rather than real analytics. “And the other 20 percent is spent complaining about the first 80 percent,” Lee joked.

“If you take a look at an AI pipeline to do something like training an object detection system for self driving cars, that pipeline — that simple sentence — may encapsulate 30 or 40 different applications,” he said.

Humans have to be removed and replaced by automation as much as possible in that scenario. “Without an infrastructure to make it easy to centralize the data-management portion of that, you’ve also potentially got 30 or 40 different data silos,” Lee said. This requires new data-centric architectures (as well as practice and processes) built around the idea that data is very difficult to move.

“You want to move it as few times as possible, manage it as little as possible,” Lee said.

Here’s the complete video interview, and there’s more coverage on SiliconANGLE and theCUBE. (* Disclosure: TheCUBE is a paid media partner for the Pure Storage Accelerate event. Neither Pure Storage Inc., the event sponsor, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.