Google looking at AI, transcription to search, analyze, and instantly translate podcasts

Abner Li | Apr 27 2018 - 9:09 am PT

Just last month, the Google app significantly updated its built-in podcast player with a homepage and subscriptions. An interview series this week revealed that Google has grander podcasts ambitions centered around AI and transcription, which allows for semantic analysis and mass search.

Speaking to Pacific Content, Google Podcasts product manager Zack Reneau-Wedeen discussed the future in the fifth and final part of his long interview. With the caveat that this “vision here is probably a little more long-term,” Google could one day “transcribe the podcast and use that to understand more details about the podcast, including when they are discussing different topics in the episode.”

Google already has that transcription technology, especially with the latest version of Cloud Speech-to-Text announced earlier this month. A part of the Google Cloud, third-parties can use this speech recognition service in call centers and for transcribing sports games.

In the latter case, Cloud Speech-to-Text is already rated for more than four speakers with background noise and over two hours in length. Given that podcasts have that same level of audio quality as a television broadcasts, it’s not too far-fetched of a possibility.

Being able to mass transcribe podcasts opens up a number of possibilities, including timestamps, indexing the contents, and making text easily searchable. For example, the former could allow for users to jump right into a section from an Assistant or Search result.

It would allow Google to “understand” the topic and what is being discussed, similar to how Knowledge Graph is used to provide answers because it’s aware of the relationship between things.

Suppose you’re a Packers fan and you asked a smart speaker, ‘How does The Impossible Burger taste?’ What if you actually got Aaron Rodgers telling you what he thinks of The Impossible Burger?

…hearing it from a voice that you recognize and a personality that you’re familiar with and trust could be a really cool experience.

It also allows for features like “Lookahead Scrubbing” which would be an equivalent to scrubbing with previews on a video clip. Noting that Google is “interested in exploring” this, the feature would “preview as you scrub” for more precise navigation.

Translation is also another possibility, with the transcription allowing for Text-to-Speech — another existing Google capability offered to third-party developers. At the end of the day, Reneau-Wedeen notes that the Google app shortcut is the “jumping off point” for more of these exciting features in the “coming months and years.”

Check out 9to5Google on YouTube for more news: