In the coming months, Google is expected to add a slew of AI features to its products, with some already rumored for I/O 2023. Gboard for Android is now working to integrate the Imagen text-to-image generator.
About APK Insight: In this “APK Insight” post, we’ve decompiled the latest version of an application that Google uploaded to the Play Store. When we decompile these files (called APKs, in the case of Android apps), we’re able to see various lines of code within that hint at possible future features. Keep in mind that Google may or may not ever ship these features, and our interpretation of what they are may be imperfect. We’ll try to enable those that are closer to being finished, however, to show you how they’ll look in case that they do ship. With that in mind, read on.
There are strings in the latest beta version (12.7.05.507749191) of Gboard that reference an “Imagen Keyboard.” It will appear in the shortcuts strip/page, like Clipboard, Translate, and One-handed. Development is not that far along yet.
Announced last May amid the DALL-E 2 craze, Imagen pairs a deep level of language understanding with an “unprecedented degree of photorealism.” In a benchmark comparison last year that included VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, Google says human raters preferred “Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.”
Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model.
Imagen is also said to be better at spatial relations, long-form text, rare words, and challenging prompts. To date, Google has not released code or a public demo citing societal impact. At the end of last year, the company said its text-to-image work would eventually be included as part of AI Test Kitchen Season 2:
- City Dreamer: Dream up a city from your imagination and Google’s text-to-image models will bring it to life.
- Wobble: Imagine a monster using Google’s text-to-image models. Using 2D-to-3D animation techniques, “wobble” it to make it dance!
In adding to Gboard, Imagen Keyboard could be similar to how Emoji Kitchen lets you combine emojis to create stickers. Like AI Test Kitchen, Imagen in Gboard will presumably focus on more light-hearted, expressive outputs.
Thanks to JEB Decompiler, from which some APK Insight teardowns benefit.
Dylan Roussel contributed to this article.
FTC: We use income earning auto affiliate links. More.
Comments