Google launches ‘Veo,’ an AI video generation tool alongside Imagen 3 upgrade

Andrew Romero | May 14 2024 - 11:30 am PT

Google has announced a groundbreaking new AI model named “Veo” that will take on video generation, tailored to users creative visions. Google is also upgrading its image generation model, bringing it to its third generation in Imagen 3.

Bard was one of our first tastes of modern AI LLMs under Google. That version first launched around a year ago, with major changes coming to the platform in recent months. One of the biggest changes was a complete name switch, rebranding the user-facing AI tool as Gemini, which has now spread throughout the company’s product lineup with Gemini Nano in current and upcoming devices and Gemini Pro.

Right before Bard was renamed to Gemini, Google added the capability to request images through the AI conversational model. Asking for an image of a cow on a boat would render exactly that, in whatever style you saw fit. That process was powered by Imagen 2, which was the first version to be publicly available.

Google’s Veo model

Today, Google is announcing two creative generation models, Veo and Imagen 3. Veo is the most exciting, as it’s something the public hasn’t been able to try just yet. The model is built specifically for video generation that understands visual semantics and natural language, similar to other modern models. That approach brought into video generation offers results that can be creatively tailored to fit certain styles.

Google notes that the Veo model will be able to understand “cinematic terms” in the user’s prompts, like aerial shots and timelapse formats. Veo is capable of generating videos in 1080p that can last beyond a minute, which surpasses current models like OpenAI’s Sora, maxing out at 60 seconds.

Veo builds upon years of our generative video model work, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere — combining architecture, scaling laws and other novel techniques to improve quality and output resolution.

Google is inviting creators and filmmakers to put Veo through its paces in order to mold the model so that it can accommodate a wide variety of artistic styles and use cases.

Imagen 3

The Imagen model is also getting a substantial update. Imagen 3 is positioned as Google’s “highest-quality” text-to-image model and offers a few improvements over the Imagen 2 model we’ve seen in Gemini and Bard.

Imagen 3 is said to bring a higher level of detail in images without as many visual artifacts and impurities in generated images. The images are more photorealistic and lifelike when requested.

Perhaps the biggest improvement is Imagen 3’s ability to render text. That has become a comical weakness of text-to-image models like DALL-E and Adobe Firefly. Google positions the new model as a way to create personalized images with text, like greeting cards or photos with messages. How well it actually renders text remains to be seen, but that’s a promising improvement.

Both Veo and Imagen 3 will be available to use in a private preview through VideoFX from Google Labs. VideoFX will be utilizing SynthID to ensure that the content created is digitally watermarked and generated in a responsible manner.

Those who want to take the new models for a spin can sign up through Google’s waitlist.

Add 9to5Google to your Google News feed.

FTC: We use income earning auto affiliate links. More.