Skip to main content

Google just tested a bunch of new AI models for Android app coding – here are the rankings

Google has once again updated its “Android Bench” rankings for the best AI models for Android app development, with a bunch of new “open-weight” models as well as more details on the tokens used and cost of using these models.

One thing that large language models have gotten really good at is coding, with their ability to aid in the development of apps and other software projects also leading to the rise of “vibe coding.” Earlier this year, Google published a new benchmark ranking that showcased the “best” AI models for Android app development, taking into account common Android development tasks as well as how these models handle best practices.

When the “Android Bench” first debuted, Gemini 3.1 Pro led the pack, and OpenAI’s GPT 5.4 later tied for the top slot.

As of the May 18, 2026 update, there’s a new king in town. According to Google, GPT 5.5 is currently the best AI model for Android app development, beating out GPT 5.4 and Gemini 3.1 Pro by a little under 2%.

Advertisement - scroll for more content

But this latest update also puts things into perspective much better, as Google now shows the average latency, total tokens used, and the average cost of using each AI model. Google details how it arrived at each metric in documentation around the benchmark.

  • Average Latency: Time taken to solve 100 tasks across 10 runs
  • Average Total Tokens: Token consumption for a full benchmark run across 10 runs
  • Average Cost: Cost per benchmark run at the time of testing in US dollars

With that in mind, though, we can see that while GPT 5.5 is a bit stronger, it costs over twice as much to perform the same function as Gemini 3.1 Pro.

Here are the top ten models according to Google, including the new data (as of May 21, 2026):

ModelScoreAvg LatencyAvg Total TokensAvg Cost
New: GPT 5.57415.564.5$133.9
GPT 5.472.421.264.2$91.7
Gemini 3.1 Pro Preview72.411.575.4$49.0
New: Claude Opus 4.768.711.690.0$124.3
GPT 5.3 Codex67.711.271.4$42.6
Claude Opus 4.666.69.969.5$84.4
GPT 5.2 Codex62.524.3124.4$121.9
Claude Opus 4.561.912.579.8$102.5
Gemini 3 Pro Preview60.49.8117.0$63.7
New: GLM 5.159.733.480.2$46.7

As mentioned, there are more open-weight models now in the rankings, including Gemma, Qwen, DeepSeek, MiMo, and more. Of these, GLM 5.1 scored the highest, followed by Kimi K2.6.

You can see the full rankings on Google’s website.

Google continues to update the “Android Bench” on a roughly monthly basis. With Gemini 3.5 Pro coming soon and 3.5 Flash already live, it’ll be interesting to see if Google’s own models can tackle the lead OpenAI has now picked up.

Are you using an AI models for Android app development? If so, which one?

More on Android:

Follow Ben: Twitter/XThreads, Bluesky, and Instagram

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Ben Schoon Ben Schoon

Ben is a Senior Editor for 9to5Google.

Find him on Twitter @NexusBen. Send tips to schoon@9to5g.com or encrypted to benschoon@protonmail.com.