‘Gemini Live’ lets you talk to Gemini as Google demos ‘Project Astra’ on glasses

Abner Li | May 14 2024 - 10:55 am PT

At I/O 2024 today, Google announced Gemini Live to talk to Gemini on the mobile app. This will soon be upgraded with conversational video capabilities as part of “Project Astra.”

Gemini Live

Launched from the voice icon of the Gemini app on Android and iOS, you’ll get a fullscreen experience with a cool audio waveform effect. This will let you have a 2-way dialogue, with Gemini returning concise responses.

You can speak at your own pace, with Google adapting, and interrupt Gemini as it’s replying to add new information or ask for clarification. Compared to the one you have today, there are 10 different voices to choose from.

Let’s say you’re getting ready for a job interview or rehearsing for an important speech: Just go Live and ask Gemini to help you prepare. Gemini will suggest skills you can highlight when talking to your potential employer, or public speaking tips to calm your nerves before you step up to the podium.

Available for Gemini Advanced subscribers, it’s launching in the coming months. Meanwhile:

Later this year you’ll be able to use your camera when you go Live, opening up conversations about what you see around you.

This is part of adding Project Astra capabilities to Gemini.

Project Astra

Looking ahead, Google DeepMind demoed Project Astra and its goal to build a universal AI agent that’s helpful in everyday life by reasoning in real time and quickly responding.

To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand context and take action. It also needs to be proactive, teachable and personal, so users can talk to it naturally and without lag or delay.

The Astra demo Google showed — single-take in real-time — pointed a phone at objects as someone issued commands or questions, with Gemini recognizing what’s in front of it in near real-time. You can show it a cityscape and ask what neighborhood you’re in, or inquire about code.

This is built on the Gemini 1.5 Pro mode and “other task specific models.” Google says it’s “designed to process information faster by continuously encoding video frames,” with reducing the response times to “something conversational” a “difficult engineering challenge.”

…combining the video and speech input into a timeline of events, and caching this information for efficient recall.”

Even more impressive was Google showing Gemini Live on smart glasses, with results overlaid over your vision. They appear to be the same translation glasses prototype shown at I/O 2022.

Add 9to5Google to your Google News feed.

FTC: We use income earning auto affiliate links. More.