Speech synthesis technology has advanced a great deal in recent years, with neural networks from DeepMind doing an especially good job of creating realistic, human-like voices. Like with any technology, it can be abused and Google is working to advance state-of-the-art research on fake audio detection.
Last year, the Google News Initiative (GNI) announced that it wanted to help tackle “deep fakes” and other systems that try to bypass voice authentication systems.
Malicious actors may synthesize speech to try to fool voice authentication systems, or they may create forged audio recordings to defame public figures. Perhaps equally concerning, public awareness of “deep fakes” (audio or video clips generated by deep learning models) can be exploited to manipulate trust in media: as it becomes harder to distinguish real from tampered content, bad actors can more credibly claim that authentic data is fake.
Working with Google AI, GNI today released a body of synthetic speech containing thousands of phrases spoken by its deep learning text-to-speech (TTS) models. This corpus contains 68 synthetic “voices” from a variety of regional accents reading from English newspaper articles.
This dataset is available for participants of the 2019 ASVspoof challenge to create “countermeasures against fake (or “spoofed”) speech, with the goal of making automatic speaker verification (ASV) systems more secure.”
By training models on both real and computer-generated speech, ASVspoof participants can develop systems that learn to distinguish between the two. The results will be announced in September at the 2019 Interspeech conference in Graz, Austria.
This effort is also a part of Google’s AI Principles to ensure “strong safety practices to avoid unintended results that create risks of harm.”