Google updating Search Live with Gemini 2.5 Flash Native Audio

Abner Li | Dec 12 2025 - 10:55 am PT

Google today announced the latest version of Gemini 2.5 Flash Native Audio. In addition to Google Translate for live headphones translation, AI Mode’s Search Live will benefit from these model upgrades.

Like Gemini Live last month, Search Live responses will now be “more fluid and expressive than ever before.” This includes voices that sound more natural and the ability to slow down the response just by asking.

Gemini 2.5 Flash Native Audio is rolling out over the next week to all Search Live (Android + iOS) users in the US.

Today’s updates are also available to third-party developers building live voice agents. Compared to the previous version, there are three improvements:

Advertisement - scroll for more content

Sharper function calling: We’ve improved the model’s reliability when triggering external functions. It can now more accurately identify when to fetch real-time information during a conversation and seamlessly weave that data back into the audio response, without breaking the flow.
Robust instruction following: The model is now better at handling complex instructions resulting in higher user satisfaction on content completeness. With a 90% adherence rate to developer instructions (up from 84%), it delivers more reliable outputs.
Smoother conversations: We’ve achieved significant gains in multi-turn conversation quality. Gemini 2.5 Flash Native Audio is able to retrieve context from previous turns more effectively, creating more cohesive conversations.

The other upgrade is support for live speech-to-speech translation. As seen with today’s Google Translate update, Gemini can translate “between two languages in real-time, automatically switching the output language based on who is speaking.”

For example, if you speak English and want to chat with a Hindi speaker, you’ll hear English translations in real-time in your headphones, while your phone broadcasts Hindi when you’re done speaking.

Notably, the resulting translation preserves the speaker’s intonation, pacing, and pitch, while filtering out ambient noise. It supports automatic language detection and multilingual input.

There’s support for over 70 languages and 2,000 language pairs by “combining Gemini model’s world knowledge and multilingual capabilities with its native audio capabilities.”