Alibaba releases Qwen3.6-35B-A3B — a multimodal MoE model focused on agentic coding

Q: What is the source?

Originally published on MarkTechPost. Hamidun News processes and adapts the material with AI.

Q: When was it published?

May 2, 2026. Reading time: 3 min.

Alibaba has released the weights for Qwen3.6-35B-A3B, a multimodal MoE model with 35 billion total and 3 billion active parameters. The new model is built…

Hamidun News Editorial

AI monitoring · MarkTechPost

May 2, 2026· 3 min

AI-processed from MarkTechPost; edited by Hamidun News

Alibaba releases Qwen3.6-35B-A3B — a multimodal MoE model focused on agentic coding — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

Alibaba's Qwen team has open-sourced Qwen3.6-35B-A3B — a new multimodal model with sparse MoE architecture. With 35 billion total parameters, only 3 billion are activated during inference, and the main focus is on agentic coding, tool use, and multimodal understanding.

What was released

Qwen3.6-35B-A3B became the first version of the Qwen3.6 line with open weights following the launch of Qwen3.

6-Plus. The model is distributed under the Apache 2.0 license, available for self-hosting on Hugging Face and ModelScope, as well as through Alibaba Cloud Model Studio API.

This is not just a text model: it includes a vision encoder, so it accepts images and video, and has a native context of 262,144 tokens with the ability to extend to approximately 1.01 million. The key idea behind the release is high performance with a low number of active parameters.

Inside, the model has 35 billion parameters, but at each step only about 3 billion actually work. According to the model card, the architecture uses 256 experts, of which 8 routed experts and 1 shared expert are simultaneously active. In practice, this means cheaper inference compared to large dense models.

Qwen3.6 also works in thinking mode by default, but supports direct answers without intermediate reasoning.

Bet on code

Qwen directly positions this release as a model for agentic coding, not just another general-purpose chatbot. The developers emphasize that Qwen3.6-35B-A3B better handles frontend tasks, repository navigation, and multi-step tool work. The model integrates with Qwen-Agent, OpenClaw, Qwen Code, and even Claude Code through compatible APIs. For long sessions, there's a separate preserve_thinking function: it saves reasoning chains from previous messages so the agent doesn't rebuild context from scratch at each step.

Tool calling and work with agentic pipelines
Repository analysis across multiple files
Generation and editing of frontend code
Long iterative sessions with preserved reasoning context

According to Qwen, the model looks strongest specifically in coding and agentic tests. On SWE-bench Verified it scores 73.4, on Terminal-Bench 2.0 — 51.5, on NL2Repo — 29.4, and on the internal QwenWebBench — 1397. This is noticeably higher than Qwen3.5-35B-A3B, and on a number of tasks better than the larger dense model Qwen3.5-27B. In other words, Qwen is trying to prove that an open-weight MoE model can be useful not only for local chat, but also for full-fledged dev workflows where you need tools, memory of previous steps, and work with an entire codebase.

Multimodality without compromise

Special emphasis is placed on vision and multimodal reasoning. According to Qwen's tables, the model shows 85.3 on RealWorldQA, 92.

8 on MMBench EN, 89.9 on OmniDocBench1.5, and 81.

9 on CC-OCR. On spatial understanding tasks the results are even more interesting: 92.0 on RefCOCO and 50.

8 on ODInW13. For video there are also strong metrics — 83.7 on VideoMMMU and 86.

2 on MLVU. For a model with 3 billion active parameters, this is a serious claim to universality, not narrow specialization only for code. The practical meaning is that Qwen3.

6-35B-A3B can be put in familiar inference stacks like vLLM and SGLang, with modes for tool use and language-only execution if you need to free memory. In Qwen's examples, the model runs with full 262K context on eight GPUs, but they separately advise not to go below 128K if thinking capabilities are important. For teams that want to keep the model in-house and not depend on closed SaaS, this already looks not like an experiment, but like a working solution.

What this means

Qwen continues to shift the open-weight market toward more practical models: not maximum size for its own sake, but a balance between inference cost, long context, multimodality, and real utility in development. If the stated results are confirmed in real-world scenarios, Qwen3.6-35B-A3B will become one of the most interesting open options for teams that need an AI assistant for code, documents, images, and agentic tasks without mandatory dependence on closed platforms.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation