Google has created an offline speech recognition system that is faster and more accurate than a comparable system connected to the Internet. While research papers are usually very theoretical, this new system is already running and has been tested on a Nexus 5.
Currently, the Android Google app has very limited offline capabilities. More advanced commands need to be sent and processed by a server. This results in high latency and often times completely fails due to unreliable networks. The alternative is “an embedded speech recognition system that runs locally on a mobile device.” However, such a system might not be accurate and can consume significant memory and other resources.
Using various machine learning techniques, Google has created a 20.3MB system that is 7x faster than a system connected to the Internet and only has a 13.5% word error rate. It was implemented and tested on a two year old Nexus 5 with a quad-core 2.26GHz processor and 2GB RAM.
To achieve this size and save on resources, the system uses a single model for both dictation and voice commands. Additional compression techniques were used to get the size down. The system was trained by exposure to 3 million anonymous voice samples (approximately 2,000 hours) from Google search. Each voice sample also had 20 distorted versions created by extracting noise from YouTube videos.
Hopefully, such improvements will be making their way into current phones in the not so distant future. The paper notes that such a system is not limited to phones and can also be used by wearable devices. Check out the research paper for full technical details.