Skip to main content

Gemini 3.5 Flash lands on Google’s Android coding rankings, but it’s 3x the cost for slower performance

Google has released another set of benchmark results to determine the best AI models for Android coding, along with how much each model costs per token. Google’s Gemini 3.5 Flash is easily the most resource-intensive in Android development, and it doesn’t even make the top five.

As the hype for general chatbots is dying down, companies like Google, OpenAI, and Anthropic are shifting towards agentic models with a strength in coding. Users have begun relying on these models for “vibe coding,” which essentially offloads the bulk of software development to LLMs.

Recent models have dramatically improved their Android coding, and Google has kept tabs on which models perform best over the past few months. The “Android Bench” goes through updates as Google releases its own models, like the recent Gemini 3.5 Flash, and compares them to the competition.

The main takeaway is how Google breaks these models down. Each model gets a score out of 100, indicative of the percentage of Android coding cases it can successfully solve across 10 runs. Google lists expected performance and the date the last test was run, with some high performers sticking around since February.

Advertisement - scroll for more content

In the latest edition of Android Bench, the results paint a more expensive picture. Gemini 3.5 Flash ranks 6th in the Android Bench list under models like GPT 5.5 and Gemini 3.1 Pro Preview, which was tested in February.

Gemini 3.5 Flash was touted as a cheaper and faster alternative to Gemini 3.1 Pro, with an expected performance gap of 6.1%. The new benchmark results say otherwise in regards to Android development, as Gemini 3.5 Flash has a higher latency and 9% gap in performance success.

The kicker – Google’s latest model costs an average of 355.9 tokens at $147.1 for one benchmark run, compared to Gemini 3.1 Pro Preview’s 73.3 tokens used at around a third of that cost.

Of course, it’s worth noting that Google lists the preview version of Gemini 3.1 Pro. That being said, the preview model scores higher than a model meant to be faster and more efficient.

GPT 5.5 ranks similarly in cost per run, but Gemini 3.5 Flash used up 5.5x more tokens in Android Bench tests. Claude’s previous model, Opus 4.7, ranked 4th at a slightly lower run cost and token usage, sitting right in the middle of the pack. Google has not released benchmark scores for Opus 4.8 or Fable 5, for that matter.

Here are the top ten models ranked by Google in the latest Android Bench release:

ModelScoreAvg LatencyAvg Total TokensAvg Cost
GPT 5.57415.764.7$134.2
GPT 5.472.421.264.2$91.7
Gemini 3.1 Pro Preview72.411.173.3$47.9
Claude Opus 4.768.711.690.0$124.3
Claude Opus 4.666.69.969.5$84.4
Gemini 3.5 Flash63.714.2355.9$147.1
GLM 5.159.733.480.2$46.7
Kimi K2.658.629.994.3$42.5
Claude Sonnet 4.658.48.247.9$40.4
DeepSeek V4 Pro55.435.8132.7$13.7
Claude Sonnet 4.553.713.194.2$61.0

The list includes several open-weight models listed among the well-know closed-weight models like Claude and GPT. The high end of the list has effectively remained unchanged since the last Android Bench, with the exception for GPT 5.3 Codex which has been removed from the list.

You can see the full rankings on Google’s website.

Google has regularly updated this list as more models are tested. At its core, it seems like a solid indicator of model performance in Android development. Gemini 3.5 Flash has been a solid improvement for other LLM and agentic tasks, even as Google has shifted cost and usage limits around. Google’s release numbers can’t be disregarded completely, though Android coding is apparently not Gemini 3.5 Flash’s strong suit.

More on AI:

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel