Alibaba released Qwen 3.5 Small — compact models that run directly on devices

Q: What is the source?

Originally published on MarkTechPost. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Mar 3, 2026. Reading time: 3 min.

Alibaba’s Qwen team introduced the Qwen 3.5 Small series of compact language models, ranging from 0.8 to 9 billion parameters. Instead of increasing size…

Hamidun News Editorial

AI monitoring · MarkTechPost

Mar 3, 2026· 3 min

AI-processed from MarkTechPost; edited by Hamidun News

Alibaba released Qwen 3.5 Small — compact models that run directly on devices — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

The artificial intelligence industry has lived by a simple formula for the past two years: the more parameters, the smarter the model. Alibaba just proposed an alternative scenario. The Qwen team released a series of compact language models called Qwen 3.5 Small — a family ranging from 0.8 to 9 billion parameters, specifically designed to run on user devices without calling cloud servers.

To understand the significance of this release, it's worth looking at the context. Over the past year and a half, the world's leading labs — OpenAI, Google, Anthropic — have competed in creating increasingly large "frontier" models with hundreds of billions and even trillions of parameters. These models demonstrate impressive results, but they have a fundamental limitation: they require the most powerful server infrastructure and operate exclusively through the cloud. Each user request travels to a remote data center and back, creating latency, internet dependency, and data privacy concerns. Alibaba decided to attack precisely this problem.

The philosophy of Qwen 3.5 Small is formulated succinctly: "More Intelligence, Less Compute" — more intelligence with lower computational costs. Behind this marketing slogan stands serious engineering work. Models in the family cover a range from 0.8 billion parameters, allowing them to run even on budget smartphones, to 9 billion — a size comfortable for modern laptops and tablets with sufficient RAM. The key question, to which there is still no comprehensive answer from open benchmarks, is how "intelligent" these models really are compared to competitors of similar size. However, previous generations of Qwen consistently showed competitive results, and there is no reason to believe the new series will be an exception.

It's important to understand that Alibaba is not a pioneer here, but perhaps the most ambitious player. Microsoft has already released the Phi series, Google — Gemma, Meta is developing compact versions of Llama. However, Qwen 3.5 Small stands out for the breadth of its lineup: offering an entire family of models of different sizes under a single architecture means giving developers flexibility of choice. A mobile note-taking app creator can take the 0.8 billion parameter version for basic auto-correction, while a smart speaker manufacturer can use the 4 billion parameter model for a full-featured voice assistant. The 9 billion parameter model is capable of handling tasks that a year ago required calling a cloud API.

For Alibaba, this release also has strategic dimensions. The Chinese technology giant is fighting for global influence in the AI infrastructure sphere. Open models are one of the main tools in this fight. Every developer integrating Qwen into their application becomes part of Alibaba's ecosystem. Given that American sanctions limit Chinese companies' access to cutting-edge chips for training giant models, the focus on efficient compact models looks not just like a technological choice, but like a forced and simultaneously far-sighted strategy. If you can't get the best hardware for training trillion-parameter monsters, it makes sense to learn how to squeeze maximum performance from smaller models.

The consequences for the industry could be significant. Models running on devices solve several painful problems at once. Privacy: user data never leaves their phone. Speed: no delay from network requests. Accessibility: AI works without internet connection. Cost: developers don't need to pay for every API call. For billions of users in developing countries, where mobile internet is unstable and expensive, local AI on a smartphone is not a luxury, but the only real path to the technology.

Of course, compact models won't replace frontier models. Complex reasoning tasks, generating long coherent text, multimodal analysis — all of this remains the territory of large models. But the vast majority of everyday tasks — summarization, translation, auto-completion, classification, simple dialogue — are well within the capabilities of models with several billion parameters. And it's these tasks that define the user experience for hundreds of millions of people.

The release of Qwen 3.5 Small confirms a trend that will define the AI industry in the coming years: the future of artificial intelligence is not only in the cloud. It's in every user's pocket, in every device capable of running inference locally. Alibaba made its bet, and now the ball is in the competitors' court.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation