A Year Later: Qwen3 Still Holds the Price/Quality Crown — LLM Model Battletest
LLM battletest results show Qwen3-235B from July 2025 once again leading in price/quality ratio. Over the year, Gemini improved by 40 points, while DeepSeek…
AI-processed from Habr AI; edited by Hamidun News
I compiled four LLM models into one batch to verify whether the smaller Gemma actually surpassed the larger one in cross-session tests. The results proved far more interesting than expected.
Head-to-Head: Gemma Neither Surpassed the Other
In a fair head-to-head comparison, the unexpected cross-session test result dissipated: both Gemma versions came out even, no difference whatsoever. But this was only the beginning. DeepSeek V4 Flash, which I rated at 83 points, this time delivered 89 — exactly 6 points higher. The model turned out to be underrated, and this became the main finding of the batch test. Overrating one model can lead to underrating the entire hierarchy. Therefore, fair head-to-head comparisons in a single context remain the gold standard.
Qwen Has Held the Crown for a Year
Meanwhile, Qwen3-235B-A22B-2507 (released July 21, 2025) once again took first place in price/quality ratio. This was a July checkpoint — almost exactly a year ago. And it still hasn't been displaced by competitors. Much has happened over this year. Gemini jumped from 57 to 97 points — a 40-point gain. I re-tested DeepSeek three times, each with new results. New contenders appeared. But Qwen? Simply holds the throne.
- Gemini: +40 points over the year
- DeepSeek V4 Flash: underrated by 6 points
- Qwen3: still best for price/quality
- MiniMax: generated buzz, solid in tests, but not revolutionary
- Eight new June models: did not displace the leader
New Metrics and MiniMax's Buzz
A new criterion was added to the rating update — generation speed. It turned out that speed and quality don't always go hand in hand. A model can be fast but slower in learning on current data, or vice versa. MiniMax deserves separate mention. It's truly praised by everyone, and in terms of capabilities it's close to Opus. But there was very active hype around it. In fair tests, it shows results worthy of attention, but not revolutionary enough to rewrite the hierarchy.
What Does This Mean
If you're choosing between quality and price, Qwen3-235B remains the best choice for most tasks. Other models are more specialized: Gemini for multimodality, DeepSeek for experimentation, MiniMax for those willing to pay more.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.