UPDATED 17:01 EDT / MARCH 12 2019

AI

Google debuts miniaturized, real-time speech recognition AI on Pixel phones

Google LLC has developed a miniaturized neural network that is small and efficient enough to perform speech recognition, a normally hardware-intensive task, directly on mobile devices.

The technology debuted today on the company’s Pixel smartphones. Google has rolled it out to its Gboard virtual keyboard app as part of an update that will make the built-in voice dictation feature usable when a device doesn’t have internet access.

Previously, the feature required a steady connection to work since the app offloaded much of the computational heavy lifting to the cloud. This is still a requirement for other services that use artificial intelligence to process speech. The reason is that turning spoken word into text normally requires several different software components too complex to run on a handset. 

In a blog post, Google researcher Johan Schalkwyk said previous iterations of Gboard used no fewer than three separate AI models. The first was responsible for organizing raw audio into phonemes, the smallest units of spoken language, while the second stitched those phonemes together into words. The data was then fed to an AI that outputted complete phrases.

Google has managed to consolidate these three models into a single neural network that handles the entire process from start to finish. Moreover, the AI processes voice in real time as the user speaks.

“The model works at the character level, so that as you speak, it outputs words character-by-character, just as if someone was typing out what you say in real-time, and exactly as you’d expect from a keyboard dictation system,” Google’s Schalkwyk wrote.

In addition to streamlining the speech recognition workflow, the search giant has also shrunk Gboard’s decoder graph, a key component responsible for coordinating the entire process. Google reduced its size by a factor of 25, from 2 gigabytes in previous iterations of the app to just 80 megabytes.

The company believes that the technology over time could be taken beyond Gboard to other applications and use cases. Schalkwyk wrote that “given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application.”

Photo: Tinh tế Photo/Flickr

Since you’re here …

… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.