After being teased at I/O 2023 in May, Google today detailed Gemini 1.0, its next-generation foundation model, and is making it available through Bard.
Gemini 1.0
As Google’s “most capable and general model,” Gemini can “understand, operate across, and combine” text, code, audio, images, and video. Being “natively multimodal” allows for better understanding, reasoning, and coding capabilities.
The current approach to creating multimodal models involves “training separate components for different modalities and then stitching them together.” While good at certain tasks, Google says these models “struggle with more conceptual and complex reasoning.”
For Gemini, Google “pre-trained from the start on different modalities” using TPU 4 and TPU v5e. Google also announced the TPU v5p (shown below) today as its “most powerful, efficient, and scalable” AI accelerator, especially for advanced models.
To show off its “sophisticated reasoning” capabilities, Google demoed Gemini digesting 200,000 scientific research papers, filtering out the relevant ones, and then summarizing the data in an hour or so. Coding is another tentpole, with Gemini able to “understand, explain and generate high-quality code” in Python, Java, C++, and Go.
Gemini 1.0 is available in three different sizes that span from data centers to phones:
- Gemini Ultra: Largest and most capable model for highly complex tasks
- Gemini Pro: Best model for scaling across a wide range of tasks
- Gemini Nano: Most efficient model for on-device tasks
Gemini benchmarks
In terms of performance, Google showed Gemini Ultra surpassing GPT-4 in text-based benchmarks that measure reasoning, math, and code. The company is particularly touting how Gemini Ultra is the “first model to outperform human experts on MMLU (massive multitask language understanding)” at 90.0%. That benchmark “uses a combination of 57 subjects such as math, physics, history, law, medicine, and ethics for testing both world knowledge and problem-solving abilities,” with OpenAI’s offering scoring 86.4%.
On the multimodal front, we see Gemini Ultra beating GPT-4V across image, video, and audio tests, while Google DeepMind has published a technical report with more specifics.
With the image benchmarks we tested, Gemini Ultra outperformed previous state-of-the-art models, without assistance from object character recognition (OCR) systems that extract text from images for further processing. These benchmarks highlight Gemini’s native multimodality and indicate early signs of Gemini’s more complex reasoning abilities.
In terms of safety, Gemini is said to have “the most comprehensive safety evaluations of any Google AI model to date,” with new protections in place to account for the multimodal capabilities. Google is specifically countering bias and toxicity.
Bard with Gemini Pro
The first way to experience this new foundational model is through “Bard with Gemini Pro.” Rolling out now, this “specifically tuned version” of Gemini Pro offers more advanced reasoning, planning, and writing, as well as content understanding and summarization. Google specifically touted performance as surpassing GPT 3.5 (in six out of eight benchmarks, including MMLU and GSM8K), and said it delivers the single biggest quality improvement to Bard since launch.
In blind evaluations with our third-party raters, Bard is now the most preferred free chatbot compared to leading alternatives.
Bard with Gemini Pro is rolling out today in English for 170 countries/territories, with UK and European availability “in the near future.” Initially, Gemini Pro will power text-based prompts, with support for “other modalities coming soon.”
Meanwhile, Gemini Ultra is coming early next year. Google is currently “completing extensive trust and safety checks,” as well as model refinements, before broader availability for developers and enterprise customers.
It will be available through a new “Bard Advanced” offering, which Google positions as providing early access to its most advanced models and capabilities, like Gemini Ultra.
Over the coming months, Gemini is coming to Google Search, Chrome, Duet AI, and Ads. Early testing has shown Gemini reducing SGE (Search Generative Experience) latency by 40%.
FTC: We use income earning auto affiliate links. More.
Comments