Google says these AI models are best for coding Android apps

Ben Schoon | Mar 6 2026 - 4:00 am PT

AI tools, love them or hate them, have been a big deal in coding and app development, and Google is now actively testing out what the best tools are for Android app development – here’s the full list.

The new “Android Bench” is a leaderboard of the best AI models to use for making Android apps. Google actively checks the top AI LLM models against a benchmark of tests that aim to figure out how these tools can handle building Android apps. Google says that it looks at how the models can work with Jetpack Compose for UI, Coroutines and Flows for asynchronous programming, room for persistence, and hilt for dependency injection. Other points include “navigation migrations, Gradle/build configurations, or the handling of breaking changes across SDK updates,” while Google says that it also measures how these tools work with core and more niche parts of Android such as camera, system UI, media, foldable adaptation, and more.

Google says that its goal is to show which AI models work best for Android app development, as existing benchmarks don’t cover the challenges a developer might face while working on Android apps.

AI-assisted software engineering has seen the emergence of several benchmarks to measure the capabilities of LLMs. Android developers face specific challenges that aren’t covered by existing benchmarks, so we created one that focuses on Android development.

With the methodology out of the way, what is the best AI model for Android app development?

Advertisement - scroll for more content

In what shouldn’t be a surprise, Google says that Gemini 3.1 Pro Preview is the top of the class with a score of 72.4% in the benchmark. Second was Claude Opus 4.6, followed by OpenAI’s GPT 5.2 Codex. The lowest score came from Gemini 2.5 Flash, at just 16.1%.

Best AI for Android app development, according to Google

Gemini 3.1 Pro Preview: 72.4%
Claude Opus 4.6: 66.6%
GPT-5.2 Codex: 62.5%
Claude Opus 4.5: 61.9%
Gemini 3 Pro Preview: 60.4%
Claude Sonnet 4.6: 58.4%
Claude Sonnet 4.5: 54.2%
Gemini 3 Flash Preview: 42%
Gemini 2.5 Flash: 16.1%

Google says that, by publishing these numbers and rankings, it hopes to “encourage LLM improvements for Android development” while also helping developers be “more productive” and, ultimately, deliver “higher quality apps across the Android ecosystem.”