OpenAI Released GPT-5.4 Mini and Nano — Near-Flagship Quality at Lower Cost
OpenAI released GPT-5.4 mini and nano — lightweight versions of the GPT-5.4 family for fast and high-volume tasks. Mini nearly matches the full-size model on…
AI-processed from ZDNet AI; edited by Hamidun News
On March 17, 2026, OpenAI unveiled GPT-5.4 mini and GPT-5.4 nano — two compact versions of the GPT-5.4 family designed for tasks where speed, cost, and the ability to run models at scale are crucial. The main intrigue is that mini has come surprisingly close to the full-size GPT-5.4 in several key benchmarks, much closer than one typically expects from a "lightweight" model.
Almost Flagship
GPT-5.4 mini is positioned not as a stripped-down compromise, but as a working model for real products. According to OpenAI, on SWE-Bench Pro it scored 54.4% compared to 57.7% for the full GPT-5.4, and on OSWorld-Verified — 72.1% compared to 75.0%. The gap exists, but it no longer looks like an abyss. At the same time, mini works more than twice as fast as the previous generation GPT-5 mini, which for user-facing scenarios is often more important than a few extra percentage points on a benchmark.
The junior GPT-5.4 nano pushes efficiency even further. OpenAI calls it the smallest and cheapest version of GPT-5.4 and recommends it for classification, data extraction, ranking, and simple code subagents. It's not a model for everything, but rather a building block for large systems where one strong agent plans the work while a set of small, fast models handles routine tasks. This approach is increasingly moving from laboratories into applied services.
Where It Will Be Useful
The point of this release goes beyond new names — it reflects a shift in focus: increasingly, AI products are pushing not for maximum answer quality, but for balance between quality, latency, and the cost of each request. If a model responds quickly, uses tools well, and doesn't break on multimodal tasks, it often delivers more business value than a heavy flagship that takes longer to think and costs more per request. For services with large numbers of requests, this immediately translates to product economics and user retention.
- AI code assistants with fast fixes and debugging
- Subagents that in parallel search through codebases and documents
- Computer use systems that read screenshots and interfaces
- Multimodal applications working with text and images in real time
- Mass background tasks like classification and field extraction
OpenAI separately emphasizes scenarios with multiple models of different classes. In such a scheme, a large model handles planning, coordination, and final verification, while mini or nano perform narrow subtasks in parallel. For developers, this means cheaper orchestration without a complete loss of quality. For the end user — more responsive products where AI no longer feels slow and heavy. And that matters more than any one-off jump in dry benchmarks.
Prices and Availability
GPT-5.4 mini has a fairly wide range of applications. The model is available in API, Codex, and ChatGPT, supports text and visual inputs, tool use, function calling, web search, file search, computer use, and skills. Context window — 400,000 tokens. Price — $0.75 per million input tokens and $4.50 per million output tokens. In Codex, mini uses only 30% of the GPT-5.4 quota, so it can be used for cheaper auxiliary tasks and parallel subagents.
GPT-5.4 nano is available only via API and costs even less: $0.20 per million input tokens and $1.25 per million output tokens. In ChatGPT, mini is already available to Free and Go users through Thinking mode, and for others it can be used as a fallback for GPT-5.4 Thinking. In practice, this divides the lineup very clearly: mini becomes the mass working model for everyday products, and nano — a utility engine for simple, frequent, and cheap backend operations.
What This Means
The launch of GPT-5.4 mini and nano shows that the model race is entering a new phase: winners will be not just the most powerful, but also those that deliver near-flagship results at low cost and minimal latency. For the market, this signals that the next wave of AI products will be built not around one "smartest" model, but around combinations of large and small ones. It's such combinations, not single super-models, that will define the next cycle of applied AI.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.