Skip to main content

Google updating Search Live with Gemini 2.5 Flash Native Audio

Google today announced the latest version of Gemini 2.5 Flash Native Audio. In addition to Google Translate for live headphones translation, AI Mode’s Search Live will benefit from these model upgrades.

Like Gemini Live last month, Search Live responses will now be “more fluid and expressive than ever before.” This includes voices that sound more natural and the ability to slow down the response just by asking. 

Gemini 2.5 Flash Native Audio is rolling out over the next week to all Search Live (Android + iOS) users in the US.

Today’s updates are also available to third-party developers building live voice agents. Compared to the previous version, there are three improvements:

Advertisement - scroll for more content

  • Sharper function calling: We’ve improved the model’s reliability when triggering external functions. It can now more accurately identify when to fetch real-time information during a conversation and seamlessly weave that data back into the audio response, without breaking the flow.
  • Robust instruction following: The model is now better at handling complex instructions resulting in higher user satisfaction on content completeness. With a 90% adherence rate to developer instructions (up from 84%), it delivers more reliable outputs.
  • Smoother conversations: We’ve achieved significant gains in multi-turn conversation quality. Gemini 2.5 Flash Native Audio is able to retrieve context from previous turns more effectively, creating more cohesive conversations.

The other upgrade is support for live speech-to-speech translation. As seen with today’s Google Translate update, Gemini can translate “between two languages in real-time, automatically switching the output language based on who is speaking.”

For example, if you speak English and want to chat with a Hindi speaker, you’ll hear English translations in real-time in your headphones, while your phone broadcasts Hindi when you’re done speaking.

Notably, the resulting translation preserves the speaker’s intonation, pacing, and pitch, while filtering out ambient noise. It supports automatic language detection and multilingual input. 

There’s support for over 70 languages and 2,000 language pairs by “combining Gemini model’s world knowledge and multilingual capabilities with its native audio capabilities.”

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Abner Li Abner Li

Editor-in-chief. Interested in the minutiae of Google and Alphabet. Tips/talk: abner@9to5g.com