For the past several releases, Gboard for Android has been working on “faster voice typing” that works offline. Google is today making it official on Pixel phones, and details the “end-to-end, all-neural, on-device speech recognizer” it created.
Google notes that a speech recognition “revolution” began in 2012 thanks to significant accuracy improvements with deep learning. The “prime focus” of various architectures was to reduce the time it takes for a user’s speech to be transcribed, or latency. Google notes how “an automated assistant feels a lot more helpful when it responds quickly to requests.”
The latest development from Google is an end-to-end, all-neural, on-device speech recognizer in Gboard when users tap the microphone icon in the top-right corner. It works completely offline and is only 85MB in size, compared to past models that were 2GB and later 450MB.
This means no more network latency or spottiness — the new recognizer is always available, even when you are offline. The model works at the character level, so that as you speak, it outputs words character-by-character, just as if someone was typing out what you say in real-time, and exactly as you’d expect from a keyboard dictation system.
Besides offline access, this new system outputs character-by-character, instead of one word at a time.
The RNN-T recognizer outputs characters one-by-one, as you speak, with white spaces where appropriate. It does this with a feedback loop that feeds symbols predicted by the model back into it to predict the next symbols, as described in the figure below.
Today’s advancement is due to various components of the speech recognition system being merged into one. A single neural network “directly map an input audio waveform to an output sentence.”
The new on-device voice typing is initially rolling out now to Pixel, Pixel 2, and Pixel 3 phones set to American English. To enable, head to Gboard settings > Voice typing > Faster voice typing. Google expects this to come to more languages and later other use cases.