Habr AI→ original

OpenAI released GPT-5.4 Pro: new records in ARC-AGI-2, FrontierMath, and logic

OpenAI released GPT-5.4 Pro and showed a noticeable jump over the previous version. The model scores 83.3% on ARC-AGI-2 versus 54% for its predecessor…

AI-processed from Habr AI; edited by Hamidun News
OpenAI released GPT-5.4 Pro: new records in ARC-AGI-2, FrontierMath, and logic
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

OpenAI has unveiled GPT-5.4 Pro — a new flagship version that makes a notable leap not only in raw metrics, but also in how the model behaves on complex tasks. If the Pro label was previously perceived as simply a more expensive plan, here it already looks like a distinct quality tier.

Breakthrough in benchmarks

The headline figure from the review — 83.3% on ARC-AGI-2 versus 54% in the previous version. For a class of tasks where models must not guess a pattern but actually derive a rule from examples, this is a sharp jump.

This result matters not in isolation, but as a signal: OpenAI has strengthened the model's ability to work where surface heuristics fail and where you must hold the task structure through to the very end. The progress in FrontierMath is no less telling — a set of problems that for a long time was considered almost closed territory for mainstream AI models. If such tests were previously used more as a demonstration of limits, they are now increasingly becoming a way to compare how well a model can build a long chain of reasoning without losing a step.

Against this backdrop, GPT-5.4 Pro looks not just faster or more convenient, but noticeably deeper in intellectual profile.

Testing beyond the benchmarks

The reviewers did not stop at benchmarks and ran the model through more applied scenarios. Instead of abstract percentages, they looked at how GPT-5.4 Pro handles tasks that require combining logic, planning, and attention to detail. This format is more interesting than a standard table because it shows not one strong skill, but the model's behavior under load, when an error midway through a chain breaks the entire result at the first failure.

  • Logical puzzles with servers and dependencies between nodes
  • Tasks requiring simultaneous management of multiple conditions
  • Scenarios involving finding non-obvious paths to solutions
  • A full stealth simulator on canvas, where plan and sequence of actions matter

From the test descriptions, the strong suit of the new version is not only the correct final answer, but also stability along the way. The model less often loses context, better maintains constraints, and does not as quickly devolve into random guesses if a task falls outside standard examples from the training corpus. For users, this matters more than a record number in a rating: this is how real quality gains feel in daily work.

What was surprising in behavior

One of the most telling episodes in the review relates not to mathematics but to the model's research behavior. While solving a problem, GPT-5.4 Pro found a forgotten scientific paper from 2011 on the internet and used it as a shortcut to the answer.

On one hand, this is impressive: the model does not simply recycle memorized patterns but knows how to find external support where it genuinely helps. On the other hand, such an episode immediately raises the question of the boundaries of autonomy and verification of found sources. This is an important shift in the very type of interaction with AI.

The user increasingly works not with a talking encyclopedia, but with a system that combines reasoning, search, and strategy adaptation to the task. This is exactly why comparison by token count or response speed alone poorly explains the real value of a model. What becomes key is something else: how reliably can it think, search, and not break on a non-standard path.

What this means

The bar for top models has risen again, and GPT-5.4 Pro shows that the next stage of competition is no longer around basic text coherence, but around depth of reasoning and resilience in complex scenarios. For the market, this means accelerating the transition from "smart chatbot" to a working tool for analysis, mathematics, programming, and multi-step tasks where a human previously still had to hedge the model on nearly every step.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…