Mistral released Small 4 — a 119-billion-parameter MoE model for reasoning, code, and multimodality

Q: What is the source?

Originally published on MarkTechPost. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Apr 30, 2026. Reading time: 3 min.

Mistral introduced Small 4, a new open-source 119-billion-parameter MoE model that combines standard chat, reasoning, agentic coding, and multimodality. The…

Hamidun News Editorial

AI monitoring · MarkTechPost

Apr 30, 2026· 3 min

AI-processed from MarkTechPost; edited by Hamidun News

Mistral released Small 4 — a 119-billion-parameter MoE model for reasoning, code, and multimodality — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

Mistral AI presented Mistral Small 4 — a new open model that should replace several separate product lines with a single universal endpoint. Instead of a separate instruct-model, reasoning-model, vision-model, and coding-agent, developers are offered one MoE-checkpoint with switchable reasoning depth.

One instead of four

The main idea of the release is not that Mistral simply scaled up the number of parameters. Small 4 consolidates into one product the roles that were previously distributed between Mistral Small for regular instructions, Magistral for complex reasoning, Pixtral for multimodal understanding, and Devstral for agentic programming. For teams building products on top of LLMs, this matters more than another benchmark score record: less routing between models, simpler infrastructure, fewer chances of getting different response styles on neighboring steps of a single scenario.

"Users no longer need to choose between fast instruct mode, reasoning, and a multimodal assistant,"

Mistral's announcement states.

In positioning, Small 4 targets several types of tasks at once: regular chat, code work, agentic workflows, and analysis of complex documents or images. Mistral directly positions the model as a universal layer for enterprise tasks, where a single API surface needs to combine text and visual requests. This is especially noticeable against a market where many teams still maintain separate models for chat, separate ones for reasoning, and separate ones for vision tasks.

How the model is structured

Architecturally, it's a Mixture-of-Experts model with 119 billion parameters. Inside — 128 experts, of which only four are activated per token, so Mistral is betting not on maximum density, but on efficiency at runtime. The company also claims a 256k context window and native support for text and images.

The release is open under the Apache 2.0 license, meaning the model can not only be used via API, but also deployed and fine-tuned for your own scenarios.

119 billion parameters in total architecture
128 experts and 4 active experts per token
Context window 256k
Inputs: text and images
Apache 2.0 license and availability for self-hosting

Mistral places particular emphasis on the reasoning_effort parameter. Essentially it's a switch between a fast response and a heavier mode of step-by-step reasoning. In none mode, the model should behave closer to Mistral Small 3.2 and deliver lighter answers with low latency. In high mode — work closer to the Magistral lineup, where quality of reasoning on complex tasks matters more than speed. The practical sense is simple: instead of a bundle of two or three models, you can maintain one deployment and change behavior at the request level.

Speed and launch

In the official announcement, Mistral bets not only on universality, but also on inference economics. The company claims a 40% reduction in full generation time in a latency-optimized configuration, and a threefold increase in requests per second in a throughput-optimized scenario, compared to Mistral Small 3. Separately, Mistral emphasizes that Small 4 with reasoning enabled shows comparable or higher results than GPT-OSS 120B on AA LCR, LiveCodeBench, and AIME 2025, while generating shorter answers. These comparisons are published by the company itself, but the focus on "quality per token" for production is indeed important.

For launch, Mistral immediately lists practical options. The model is available via Mistral API and AI Studio, uploaded to Hugging Face, and announced for vLLM, llama.cpp, SGLang, and Transformers stacks. For self-hosting, requirements are no longer "desktop": minimum configuration is listed as 4x NVIDIA HGX H100, 2x HGX H200, or 1x DGX B200, with more powerful setups recommended for better performance. So Small 4 looks like an open model not for a laptop, but rather for serious server infrastructure and product teams for whom control, customization, and predictable cost of ownership matter.

What this means

Mistral is moving the open-source segment toward more universal models, where the main advantage is not only quality, but also simplification of the entire system around LLMs. If Small 4 confirms its claimed efficiency in real production workloads, the company will gain a strong argument against a zoo of separate reasoning, vision, and coding models. For business, it's a chance to reduce the complexity of the orchestration layer, and for developers — to get one customizable base layer for a wide range of tasks.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation