Habr AI→ original

A Year Later: Qwen3 Still Holds the Price/Quality Crown — LLM Model Battletest

LLM battletest results show Qwen3-235B from July 2025 once again leading in price/quality ratio. Over the year, Gemini improved by 40 points, while DeepSeek…

AI-processed from Habr AI; edited by Hamidun News
A Year Later: Qwen3 Still Holds the Price/Quality Crown — LLM Model Battletest
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

I compiled four LLM models into one batch to verify whether the smaller Gemma actually surpassed the larger one in cross-session tests. The results proved far more interesting than expected.

Head-to-Head: Gemma Neither Surpassed the Other

In a fair head-to-head comparison, the unexpected cross-session test result dissipated: both Gemma versions came out even, no difference whatsoever. But this was only the beginning. DeepSeek V4 Flash, which I rated at 83 points, this time delivered 89 — exactly 6 points higher. The model turned out to be underrated, and this became the main finding of the batch test. Overrating one model can lead to underrating the entire hierarchy. Therefore, fair head-to-head comparisons in a single context remain the gold standard.

Qwen Has Held the Crown for a Year

Meanwhile, Qwen3-235B-A22B-2507 (released July 21, 2025) once again took first place in price/quality ratio. This was a July checkpoint — almost exactly a year ago. And it still hasn't been displaced by competitors. Much has happened over this year. Gemini jumped from 57 to 97 points — a 40-point gain. I re-tested DeepSeek three times, each with new results. New contenders appeared. But Qwen? Simply holds the throne.

  • Gemini: +40 points over the year
  • DeepSeek V4 Flash: underrated by 6 points
  • Qwen3: still best for price/quality
  • MiniMax: generated buzz, solid in tests, but not revolutionary
  • Eight new June models: did not displace the leader

New Metrics and MiniMax's Buzz

A new criterion was added to the rating update — generation speed. It turned out that speed and quality don't always go hand in hand. A model can be fast but slower in learning on current data, or vice versa. MiniMax deserves separate mention. It's truly praised by everyone, and in terms of capabilities it's close to Opus. But there was very active hype around it. In fair tests, it shows results worthy of attention, but not revolutionary enough to rewrite the hierarchy.

What Does This Mean

If you're choosing between quality and price, Qwen3-235B remains the best choice for most tasks. Other models are more specialized: Gemini for multimodality, DeepSeek for experimentation, MiniMax for those willing to pay more.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…