Gemini

Google announces Gemini 1.5 with greatly expanded context window

Abner Li | Feb 15 2024 - 7:59 am PT

Following the 1.0 launch in December, Google today announced Gemini 1.5 as its next-generation model with “dramatically enhanced performance.”

One of the main advancements in Gemini 1.5 is a significantly larger context window.

An AI model’s “context window” is made up of tokens, which are the building blocks used for processing information. Tokens can be entire parts or subsections of words, images, videos, audio or code. The bigger a model’s context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful.

Gemini 1.5 Pro — Google’s middle tier — has a standard context window of 128,000 tokens (versus 32,000 tokens for Gemini 1.0). This translates to over 700,000 words, codebases with over 30,000 lines of code, 11 hours of audio, or 1 hour of video. GPT-4 Turbo is also at 128,000 and Claude 2.1 offers 200,000. Examples of that in action include:

“1.5 Pro can seamlessly analyze, classify and summarize large amounts of content within a given prompt. For example, when given the 402-page transcripts from Apollo 11’s mission to the moon, it can reason about conversations, events and details found across the document.”

“1.5 Pro can perform highly-sophisticated understanding and reasoning tasks for different modalities, including video. For instance, when given a 44-minute silent Buster Keaton movie, the model can accurately analyze various plot points and events, and even reason about small details in the movie that could easily be missed.”

“1.5 Pro can perform more relevant problem-solving tasks across longer blocks of code. When given a prompt with more than 100,000 lines of code, it can better reason across examples, suggest helpful modifications and give explanations about how different parts of the code works.”

What’s more notable is that Google has run up to 1 million tokens in production and is making that available to some early testers, while it has “successfully tested up to 10 million tokens” (text).

These advancements are made possible by a new Mixture-of-Experts (MoE) architecture where models are “divided into smaller ‘expert’ neural networks.” This makes Gemini 1.5 more efficient to both train and serve.

Depending on the type of input given, MoE models learn to selectively activate only the most relevant expert pathways in its neural network. This specialization massively enhances the model’s efficiency.

In terms of performance, 1.5 Pro outperforms 1.0 Pro on 87% of the benchmarks across text, code, image, audio, and video evaluations. It even “performs at a broadly similar level” as 1.0 Ultra.

Gemini 1.5 Pro also shows impressive “in-context learning” skills, meaning that it can learn a new skill from information given in a long prompt, without needing additional fine-tuning.

Gemini 1.5 Pro (128,000 token context window) is launching as a limited preview to developers and enterprise customers via AI Studio and Vertex AI. It’s described as experimental during this period.

Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model.

Add 9to5Google to your Google News feed.

FTC: We use income earning auto affiliate links. More.