In addition to Imagen, Google has another text-to-image generator called Parti that also strives for photorealism but by using a different family of generative models.

Pathways Autoregressive Text-to-Image (Parti) uses an autoregressive model that can “benefit from advances in large language models.” For comparison, Imagen uses Diffusion, where the model learns to convert a pattern of random dots into images.

Parti’s approach first converts a collection of images into a sequence of code entries, similar to puzzle pieces. A given text prompt is then translated into these code entries and a new image is created. This approach takes advantage of existing research and infrastructure for large language models such as PaLM and is critical for handling long, complex text prompts and producing high-quality images.

Google found that Parti can “manage long, complex prompts” that:

Accurately reflect world knowledge

Are compose of many participants and objects, with fine-grained details and interactions

Adhere to a specific image format and style

Like with Imagen, Google has opted not to release Parti’s “models, code, or data for public use without further safeguards in place.” All images are watermarked in the bottom-right corner.

Current models like Parti are trained on large, often noisy, image-text datasets that are known to contain biases regarding people of different backgrounds. This leads such models, including Parti, to produce stereotypical representations of, for example, people described as lawyers, flight attendants, homemakers, and so on, and to reflect Western biases for events such as weddings.

Google is exploring this area and thinks tools like these “can unlock joint human/computer creativity.” The full research paper for Parti is available here, while the interactive website lets you change up the word prompts.

Our goal is to bring user experiences based on these models to the world in a safe, responsible way that will inspire creativity

