Habr AI→ original

DeepSeek and GLM-5 beat Yandex in a test of 34 AI models for managers without VPN

A large test of 34 models on manager tasks showed that, without VPN in Russia, GLM-5, DeepSeek V3.2 and DeepSeek R1 perform best. The gap with global leaders…

AI-processed from Habr AI; edited by Hamidun News
DeepSeek and GLM-5 beat Yandex in a test of 34 AI models for managers without VPN
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

The authors of a large-scale test evaluated 34 AI models on typical manager tasks and separately examined which ones are usable in Russia without VPN. The main finding proved uncomfortable for local players: Chinese models performed best, while Yandex solutions turned out to be far from the leaders.

Leaders Without VPN

The study was built not on abstract benchmarks, but on 32 practical scenarios: from partner emails and project plans to report analysis, prioritization, hiring, and adaptation to the Russian context. All requests were asked in Russian without prompt engineering, as a typical manager would normally do. Answers were evaluated by two separate judge models, and then consolidated into an overall score on a scale from 1 to 5. This approach was designed to show how models behave in a normal work environment, not in a laboratory.

  • GLM-5 — 4.50 points, free chat and first place in team management tasks
  • DeepSeek V3.2 — 4.41 points, free chat and very cheap API
  • DeepSeek R1 — 4.31 points, stronger in analytics due to reasoning mode
  • Mistral Large — 4.25 points, solid option with chat and API

After the March 17, 2026 update, GLM-5 was added to the ranking and the incorrect assumption about Grok's availability without VPN was removed. In the current version of the article, GLM-5 took first place among accessible models, and DeepSeek V3.2 established itself as the most practical option in terms of quality-to-price-to-accessibility ratio. The authors specifically emphasize that the difference between levels is not felt on paper, but in practice: strong models provide answers that can be used almost immediately.

Gap with Global Top

To understand the real ceiling of quality, the authors compared accessible models with those blocked in Russia. The global top included Claude Sonnet 4.5, GPT-5.2 Pro, and Claude Opus 4.5 with an average result of around 4.78 points. The best models accessible without VPN scored an average of 4.36. This is a difference of about 0.4 points: not a chasm, but a transition from the "excellent" to the "good" category.

"The answer 'it depends on the task' is honest, but useless."

However, the gap is distributed unevenly. In planning and problem-solving, accessible models nearly catch up with the global top: the lag there is only 0.1–0.2 points and often goes unnoticed in practice. The situation is worse with employee learning and development tasks — for example, when you need to create a career plan, a mentoring program, or growth recommendations. Here the lag reaches half a point, so answers need to be rechecked more carefully. This is where the difference in depth of reasoning and advice adequacy becomes apparent.

Why Yandex Lost

The most notable failure of the study was related to Yandex. The company's best model, Alice AI LLM, scored 3.84 points and fell into only the third tier, below DeepSeek, Mistral, and even Xiaomi's MiMo v2 Flash.

Even more telling is the result in the regional specificity category, where Russian labor law, local compliance, and cultural context were tested. There Alice scored 3.68 versus 4.

56 for GPT-5.2 and 4.34 for DeepSeek V3.

2. The authors explain this simply: for business tasks, a model's analytical power matters more than the fact that it was trained on Russian content. In other words, a good global model that works reasonably well with Russian can confidently outpace a "native" model with weaker reasoning.

That said, the authors themselves acknowledge that Yandex has different internal comparison methodology, and in their own tests Alice beat older DeepSeek V3.1 and Qwen on some tasks. But on the set of 32 management scenarios, V3.

2 proved stronger than Yandex across all eight categories.

What This Means

For Russian-speaking teams, the AI market no longer boils down to a choice between Western leaders and local products. If you need a working tool without VPN, it now makes more sense to look at DeepSeek and GLM-5: they don't quite reach the absolute top, but already cover most of a manager's everyday tasks. And the promise of "we understand Russian better" no longer guarantees leadership on its own. For business, this is already a practical, not theoretical choice.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…