Skip to main content

Google’s Live Captions can now use AI [gasp] to make ‘Expressive Captions’

Google’s Live captions are becoming richer with new AI-driven “Expressive Captions” that convey more than basic language, including sounds and actions. Google is also bringing Gemini 1.5 to Image Q&A in the Lookout app.

Live Caption has been a staple of Google’s Pixel lineup since 2019. The feature allows users to insert captions where there are normally none using the phone’s Tensor SoC and onboard processing. When a voice is heard through a video or other media playing audio, the Pixel phone will pick up on that speech and display it as it hears it. It’s useful for a variety of users, especially those who are deaf/hard of hearing.

Live Captions are getting an overhauled mode for processing audio more dynamically. Google announced that Expressive Captions would allow users to see the nuanced speech and actions in media through Live Captions using AI on-device. That includes decoding tone, volume, and environmental cues. The change will dynamically reflect the way speech is presented.

Google gives a couple of examples of how this will work. When someone yells something, that intensity is translated to captions in all caps. If someone were to yell, the caption would reflect the volume. Google’s expressive captions using AI can also decode vocal bursts, such as sighs and groans, detailing the little sounds in between words. Even ambient sounds are represented to fill in the blacks around speech.

In addition, Google announced that image descriptions can now be read aloud. With that, the company is bringing Gemini 1.5 Pro to the Lookout app – an app that aids the vision-impaired. The Q&A feature, which allows users to ask questions about an image, will now be a little more capable. An image can be described in a more natural voice via the Gemini model and will be capable of giving more surrounding information beyond a simple description.

It’s noted that Google’s expressive AI captions are a part of Live Caption, so there is no restriction to which Pixel devices can utilize it. If Live Caption is available, this upgrade will be reflected. Google does note that the feature will not be compatible with phone calls, though that might change over time.

More on Google:

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Manage push notifications

notification icon
We would like to show you notifications for the latest news and updates.
notification icon
You are subscribed to notifications
notification icon
We would like to show you notifications for the latest news and updates.
notification icon
You are subscribed to notifications