Jiqizhixin (机器之心)→ original

304 Chinese LLMs: Why Among Hundreds of Neural Networks They Haven't Found a King

В Китае подвели итоги масштабного теста 304 языковых моделей. Спойлер: «универсального короля» не существует. Пока разработчики плодят нейросети в рамках «битвы

AI-processed from Jiqizhixin (机器之心); edited by Hamidun News
304 Chinese LLMs: Why Among Hundreds of Neural Networks They Haven't Found a King
Source: Jiqizhixin (机器之心). Collage: Hamidun News.
◐ Listen to article

Remember the days when every new announcement from China was accompanied by cries about "GPT-4's death"? Well, the dust has settled a bit, and researchers decided to conduct a large-scale inventory of what the famous "thousand model battle" actually spawned. The results turned out to be sobering. After testing 304 Chinese large language models, it turned out that there is no universal leader on the market. One model excels at writing code, the second masterfully pretends to be a poet, and the third handles logic fairly well, but no one has yet managed to combine all of this in one "bottle." We are observing a situation where quantity has turned into anything but quality.

The main problem right now isn't even that models aren't smart enough. The industry has encountered what is called an "evaluation bottleneck." When you have three hundred neural networks in your country, checking each one for adequacy becomes a task of epic proportions.

Traditional benchmarks have long been compromised: developers simply "cheat" by looking up answers in tests and train their models on them. To get an honest result, you need real people or complex cascading checks, and that costs astronomical sums. At some point, the AI audit process became costing companies almost as much as the rent for graphics cards for training.

Against this backdrop, the solution from the ReLE team looks like an attempt to save venture capitalists' budgets. They proposed the Reinforcement Learning from Evaluation architecture. Without getting bogged down in formulas, this is a way to optimize the testing process itself.

Instead of running a model through thousands of similar questions, the system learns to select only the most informative and difficult tasks. It's like if at an exam a professor immediately asked you three of the most tricky questions instead of torturing you for three hours across the entire curriculum. The result is the same, but you spend 70% less time and resources.

Why is this important for us? The Chinese AI market has always been a hypertrophied reflection of global trends. If they've started massively complaining about evaluation costs, it means this problem will soon hit Western startups too.

We're entering an era where "efficiency" becomes more important than "power." Investors no longer want to hear about how many trillion parameters you crammed into your model. They want to know how you plan to prove its viability without spending your entire next financing round on it.

It's also interesting how the development landscape is changing. While giants like Baidu or Alibaba try to build those universal systems, small teams find salvation in narrow specialization. The research showed that specialized models often outperform "generalists" in their niches while requiring tens of times fewer resources.

This questions the very concept of creating one neural network that will both cook borscht and launch rockets into space. Perhaps the future lies not with one king, but with a harmonious council of ministers. The main point: the era of mindless scaling is coming to an end.

Now the winner will be not the one who trains the largest model, but the one who learns fastest and cheapest to separate the wheat from the chaff. Will ReLE become a new industry standard or is it just a temporary patch on a bloated market?

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…