Last October, Google made Cloud Text-to-Speech with realistic WaveNet voices from DeepMind available to all developers. Updates to Cloud TTS and Speech-to-Text today introduce additional languages, voices, and more affordable pricing models.
Cloud Speech-to-Text — or speech recognition — is important when creating voice applications and devices. This API is also useful for transcribing video, and in call center settings. At launch last year for these premium models, Google asked customers to share usage data to help improve the accuracy of these models.
We are excited to share today that the resulting enhanced phone model now has 62% fewer transcription errors (improved from 54% last year), while the video model, which is based on technology similar to what YouTube uses for automatic captioning, has 64% fewer errors.
Meanwhile, the enhanced phone model is now widely available without data logging, but will be pricier. The existing options with data sharing enabled to improve accuracy is now 33% cheaper. The video model, along with multi-channel recognition for more than one talker, is also entering general availability (GA) with SLA and enterprise-level guarantees.
Meanwhile, Cloud Text-to-Speech is now available in beta for seven new languages/variants: Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian, and Norwegian Bokmål.
This brings up the language total for Cloud TTS to 21, with 31 new WaveNet voices and 24 new standard voices also announced today for 106 voices in all. Entering GA today is Device Profiles to optimize audio playback on different types of hardware, like headphones and IVR systems.