Skip to main content

Google says these AI models are best for coding Android apps

AI tools, love them or hate them, have been a big deal in coding and app development, and Google is now actively testing out what the best tools are for Android app development – here’s the full list.

The new “Android Bench” is a leaderboard of the best AI models to use for making Android apps. Google actively checks the top AI LLM models against a benchmark of tests that aim to figure out how these tools can handle building Android apps. Google says that it looks at how the models can work with Jetpack Compose for UI, Coroutines and Flows for asynchronous programming, room for persistence, and hilt for dependency injection. Other points include “navigation migrations, Gradle/build configurations, or the handling of breaking changes across SDK updates,” while Google says that it also measures how these tools work with core and more niche parts of Android such as camera, system UI, media, foldable adaptation, and more.

Google says that its goal is to show which AI models work best for Android app development, as existing benchmarks don’t cover the challenges a developer might face while working on Android apps.

AI-assisted software engineering has seen the emergence of several benchmarks to measure the capabilities of LLMs. Android developers face specific challenges that aren’t covered by existing benchmarks, so we created one that focuses on Android development.

With the methodology out of the way, what is the best AI model for Android app development?

Advertisement - scroll for more content

In what shouldn’t be a surprise, Google says that Gemini 3.1 Pro Preview is the top of the class with a score of 72.4% in the benchmark. Second was Claude Opus 4.6, followed by OpenAI’s GPT 5.2 Codex. The lowest score came from Gemini 2.5 Flash, at just 16.1%.

Best AI for Android app development, according to Google

  • Gemini 3.1 Pro Preview: 72.4%
  • Claude Opus 4.6: 66.6%
  • GPT-5.2 Codex: 62.5%
  • Claude Opus 4.5: 61.9%
  • Gemini 3 Pro Preview: 60.4%
  • Claude Sonnet 4.6: 58.4%
  • Claude Sonnet 4.5: 54.2%
  • Gemini 3 Flash Preview: 42%
  • Gemini 2.5 Flash: 16.1%

Google says that, by publishing these numbers and rankings, it hopes to “encourage LLM improvements for Android development” while also helping developers be “more productive” and, ultimately, deliver “higher quality apps across the Android ecosystem.”

More on Android:

Follow Ben: Twitter/XThreads, Bluesky, and Instagram

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Ben Schoon Ben Schoon

Ben is a Senior Editor for 9to5Google.

Find him on Twitter @NexusBen. Send tips to schoon@9to5g.com or encrypted to benschoon@protonmail.com.