Habr AI→ original

OpenAI releases GPT-5.4: betting on computer agents, not just code

On March 5, 2026, OpenAI released GPT-5.4, a flagship model with native computer control, tool search and an xhigh mode for complex tasks. The most notable…

AI-processed from Habr AI; edited by Hamidun News
OpenAI releases GPT-5.4: betting on computer agents, not just code
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

OpenAI released GPT-5.4 on March 5, 2026 — a new flagship that combines strong coding, reasoning, and native computer control. The main focus of the release has shifted from a "smart chatbot" to an AI agent that not only answers questions but can also execute chains of actions in interfaces and work tools.

Bet on Agents

The main news in GPT-5.4 is built-in Computer Use. The model can work from screenshots, manage the cursor, click buttons, fill out forms, and verify results.

For developers, this means a more mature transition from simple text generation to agent scenarios: bots can navigate websites, execute browser steps, and perform routine operations without a hard-coded script for every click. This is no longer a feature for demo videos, but a foundational layer for real business processes where an agent needs to see the interface and confirm that an action actually worked. According to OpenAI's official data, the improvement is particularly noticeable where the model should act autonomously rather than simply write code in a vacuum.

On OSWorld-Verified, GPT-5.4 reached 75.0% compared to 47.

3% for GPT-5.2, and on BrowseComp — 82.7% compared to 65.

8%. Meanwhile, the improvement in SWE-Bench Pro was more modest: 57.7% compared to 55.

6%. This is a good signal for those building assistants and workflow agents, and more restrained for those expecting an unconditional breakthrough in programming.

  • Native computer control through screenshots and UI actions
  • Tool Search for large tool sets without bloating the prompt
  • `xhigh` reasoning mode for heavy-duty tasks
  • Up to 1 million context tokens in Codex with experimental support
  • Lower rate of factual errors compared to GPT-5.2

What the Tests Showed

In practice, the picture was not as uniform as in the presentation benchmarks. In a visual test, the model was asked to build a complex smart home dashboard, then render the result itself and fix errors after self-verification. The overall composition and style were recognizable, but the details fell short: text overlapped blocks, margins were off, some elements were cut off, and the claimed neumorphic thermostat turned out to be a simplified circle.

That is, the model already grasps the feeling of "premium interface," but it's still far from being an autonomous senior frontend developer. However, in the backend scenario, GPT-5.4 looked more convincing.

On a task about a production-ready rate limiter for FastAPI and Redis, the model didn't limit itself to a basic solution, but built a full-fledged scheme with strict typing, a Lua script for atomicity, and a local fallback in case Redis goes down. In a logical test with conflicting schedule conditions, it also worked correctly: it didn't make up "some kind of" answer, but consistently proved that no solution exists. This is an important maturity marker: the model more often recognizes contradictions instead of confidently hallucinating.

Price and Availability

OpenAI launched GPT-5.4 on March 5, 2026 directly in the API, Codex, and ChatGPT as GPT-5.4 Thinking.

As of release, the model began replacing GPT-5.2 Thinking for paid ChatGPT Plus, Team, and Pro users, while GPT-5.4 Pro became available on Pro and Enterprise tiers.

For developers, this may be no less important news than the benchmarks themselves: the new flagship didn't remain a lab demo but was immediately embedded in production products. The API pricing looks more aggressive than one might expect from OpenAI's flagship: $2.50 per million input tokens, $0.

25 for cached input, and $15 per million output tokens. Yes, the model is more expensive than GPT-5.2 per token, but OpenAI is betting on better token efficiency: if an agent solves a task in fewer steps and doesn't wander through context as much, the overall economics could be quite reasonable even for small teams.

What This Means

GPT-5.4 doesn't look like a "magical developer replacement," but it clearly shows where the market is heading. The next competition is not just over answer quality, but over the model's ability to see interfaces, use tools, remember long context, and drive tasks to completion. For business, this means one simple thing: value is shifting from single prompts to agents that can work within real processes. It's in such scenarios that the question of which platform best suits real work—rather than one-off wow demos—will now be decided.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…