Google announces Translatotron tool for translating speech in the speaker’s original voice
Google LLC today introduced what it says is an experimental new system for speech translation that removes many of the steps involved in its earlier models.
Even better, the synthesized translations it produces retain the sound of the original speaker’s voice, so it actually sounds like the person is speaking in the target language.
Google said its Translatotron tool simplifies a complex process for translating speech into different languages. Existing translation systems such as Google Translate have to do it in a kind of roundabout way, first transcribing the original speech into text, then translating it into text in the target language, and finally using this new text to synthesize speech in the translated language.
Obviously, all of these steps can slow things down, but Translatotron speeds things up because it uses a single model that eliminates the need to translate speech to text first.
“This system avoids dividing the task into separate stages,” Google AI engineers Ye Jia and Ron Weiss wrote in a blog post. The result should be faster translation speeds and less compounding errors, they said.
“To the best of our knowledge, Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language,” Jia and Weiss added. “It is also able to retain the source speaker’s voice in the translated speech.”
The Translatotron system works by using “spectrograms,” which are visual representations of the spectrum of frequencies of audio signals as they vary over time, as its input training data. An encoder network is used to capture the speaker’s voice, while “multitask learning” is used to predict the words they are saying, and translate them into the target language.
Google admits the system is still experimental, and that the BLEU score that’s used to measure machine translation quality found that its accuracy is still currently lower than conventional translation tools. However, Google said it’s working to improve the system.
Analyst Holger Mueller of Constellation Research Inc. told SiliconANGLE that Translatotron was an interesting concept, noting that transcription is becoming table stakes for cloud providers.
“The combination of understanding speech and then translating it to a desired language is raising the game and that’s what Google is doing with the Translatotron,” Mueller said. “We are getting close to the point where kids will be asking why they should even bother with learning a foreign language.”
Indeed, within a few years it really might not be necessary to speak more than one language. One possible application for Translatotron could be the new “Interpreter Mode” found in Google Assistant, which was added to Google Home speakers earlier this year. Interpreter Mode currently relies on Google’s conventional translation tools and can translate speech between 27 language pairs.
For a more in-depth look at how Translatotron works, Google has a whitepaper on the subject.
Image: Google
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.