36Kr (36氪)→ original

GPT-5.2: OpenAI научила модель бегать на 40% быстрее (без «стероидов»)

OpenAI Developers сообщили о значительном апдейте: GPT-5.2 и Codex теперь выдают токены на 40% быстрее. Инженеры не трогали структуру или веса, сосредоточившись

AI-processed from 36Kr (36氪); edited by Hamidun News
GPT-5.2: OpenAI научила модель бегать на 40% быстрее (без «стероидов»)
Source: 36Kr (36氪). Collage: Hamidun News.
◐ Listen to article

Have you ever wondered why your favorite chat-bot sometimes "hangs" for a few seconds before giving an answer that it clearly already knows? In the world of large language models, time is not just money—it's user experience and, ultimately, product survival in the marketplace. Today, the OpenAI Developers team has thrown fresh logs on the fire of the speed race, announcing a significant acceleration of its current models.

We're talking about GPT-5.2 and the specialized Codex model, which have suddenly become 40% faster. What's most intriguing about this news is the technical side of things.

OpenAI engineers emphasized that this acceleration was achieved without changing the model's architecture and without recalculating weights. For those not immersed in the details: usually, to make a model faster, you either have to "trim" it (distillation) or retrain it from scratch with fewer parameters. Here we see pure inference optimization magic.

It appears that Sam Altman's team has found a way to use the available hardware more efficiently without sacrificing the "brains" of the neural network.

Why does this matter right now? We're at a point where the quality of answers from top models—whether GPT, Claude, or Gemini—has reached a certain plateau. The difference in logic is becoming less and less noticeable to the average user. Now the battle is shifting to the plane of efficiency. If your model generates the same quality code as a competitor's but does it almost 1.5 times faster, developers will choose you. For Codex, this is literally a matter of life and death: when you're writing code in an IDE, even a half-second delay starts to annoy and break your flow of thought.

This 40% jump also hits the positions of hardware startups like Groq, which build their marketing solely on insane token generation speeds. If OpenAI can continue to optimize software at this pace, the need for specialized "accelerators" might not be as acute as analysts predicted. This is a signal to the entire market: before buying another ten thousand H100s, try rewriting your CUDA cores and optimizing batching.

For end businesses, this update means direct savings. Faster inference means fewer server time costs per request. It's unclear whether this will be reflected in the cost of tokens in the API, but historically, OpenAI has always tried to convert technical efficiency into price reductions to capture market share. Likely, in the coming weeks, we'll see a price list update that will make Anthropic and Google scratch their heads again.

Ultimately, we're seeing that the era of "brute force," when progress was achieved only by increasing the scale of computation, is gradually being supplemented by an era of elegant engineering craftsmanship. OpenAI is clearly signaling that they're not just renting huge clusters from Microsoft, but also know how to squeeze the maximum out of them. This is a good sign for the entire industry: the potential of current architectures is far from exhausted.

The main point: OpenAI is shifting focus from "intelligence" to "speed," and 40% is only the beginning. Are competitors ready for such optimization without losing quality?

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…