Yuan 3.0 Ultra: one trillion parameters with record efficiency

Q: What is the source?

Originally published on MarkTechPost. Hamidun News processes and adapts the material with AI.

Q: When was it published?

2026-03-05. Reading time: 3 min.

YuanLab AI has released Yuan 3.0 Ultra, an open multimodal model based on a Mixture-of-Experts architecture with one trillion parameters. At any given time, how

Hamidun News Editorial

AI monitoring · MarkTechPost

2026-03-05· 3 min

AI-processed from MarkTechPost; edited by Hamidun News

Yuan 3.0 Ultra: one trillion parameters with record efficiency — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

The trillion-parameter model race enters a new phase — and now the measure of success becomes not size, but the ability to use it efficiently. Chinese laboratory YuanLab AI unveiled Yuan 3.0 Ultra, an open multimodal model based on Mixture-of-Experts architecture that operates with a trillion parameters but activates only 68.8 billion at any given moment. Behind these dry numbers lies a fundamental shift in the philosophy of building large language models: instead of scaling computational power "by force," developers are betting on surgical precision in resource utilization.

To understand the scale of this claim, context is needed. Mixture-of-Experts architecture is not new. Google uses it in Gemini, it underlies Mixtral from Mistral AI, and by some leaks—also in GPT-4 from OpenAI. The essence of the approach is that the model consists of multiple "expert" subnetworks, and to process each specific query, only a small portion of them activates. This allows for enormous knowledge capacity without the need to pass each token through all parameters. Yuan 3.0 Ultra takes this idea to its logical limit: from a trillion parameters, simultaneously less than seven percent work. For comparison, Mixtral 8x7B had a significantly less aggressive ratio of active to total parameters.

The claimed efficiency metrics are particularly noteworthy. According to YuanLab AI, the total number of model parameters was reduced by 33.3% compared to architecturally equivalent capabilities, and pretraining efficiency improved by 49%. This means that achieving comparable answer quality requires significantly fewer computational resources and training time. In an era when the cost of training a single flagship model is measured in tens and hundreds of millions of dollars, and access to GPU clusters remains a bottleneck for most companies, such an efficiency gain is not just a technical achievement, but an economic argument.

Multimodality of Yuan 3.0 Ultra is another important aspect. The model is positioned as capable of working not only with text but with other data types, making it suitable for a wide range of corporate tasks—from analyzing documents with images to complex scenarios requiring understanding context from multiple modalities. Details of architectural decisions enabling multimodality remain only partially disclosed, but the very fact of integrating these capabilities into a trillion-scale MoE model speaks to the maturity of the approach.

The decision to open-source the model deserves separate attention. Chinese AI laboratories in the past eighteen months have consistently expanded their presence in the open community: DeepSeek, Qwen from Alibaba, Yi from 01.AI—all release models with open weights, creating a powerful alternative to closed Western systems.

Yuan 3.0 Ultra fits this trend but raises the bar: a trillion-parameter MoE model with open access is a challenge not only for commercial competitors but for the entire open AI ecosystem. The question is whether researchers and companies outside the largest cloud providers can realistically deploy and use a model of this scale.

Even accounting for the fact that active parameters are "only" 68.8 billion, inference on a trillion-parameter MoE model requires serious infrastructure for storage and routing between experts.

For the industry, Yuan 3.0 Ultra is further confirmation that MoE is becoming the dominant architecture for next-generation models. Dense transformers, where every parameter is active on each call, increasingly look like a wasteful approach of the past era. Simultaneously, the model intensifies competition between Chinese and American laboratories: if the claimed efficiency metrics are confirmed by independent benchmarks, this will be a serious argument that technological leadership in AI has ceased to be a Silicon Valley monopoly.

Still, bold claims warrant professional skepticism. Until results are published on standard benchmarks compared to GPT-4o, Claude 3.5, Gemini Ultra, and other flagships, talking about "unparalleled efficiency" is premature. The real test of Yuan 3.0 Ultra will begin when the community gains access to the weights and can conduct independent evaluation. Only then will it become clear whether this model is a genuine breakthrough or another ambitious but overvalued release in an overheated race for scale.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation