OpenAI вскрыла капот: как их агенты пишут код (и почему это не магия)
Пока индустрия ждет новых моделей, OpenAI решила поиграть в прозрачность. Компания опубликовала детальный разбор работы своих автономных агентов для написания к
AI-processed from Ars Technica; edited by Hamidun News
OpenAI broke its silence. After months when the company guarded secrets as carefully as the Coca-Cola recipe, developers suddenly released details on how their models transform from advanced text auto-completion tools into full-fledged engineers. This is about the technical architecture of coding agents, and this analysis gives us a rare opportunity to glimpse a future where programming ceases to be manual labor. It must be said, the name OpenAI has looked rather ironic in recent years, but this technical post takes us back to the times when the company actually shared its expertise.
At the center of attention is the so-called "agent loop" or agent cycle. Previously, we were used to neural networks simply outputting a piece of code in response to a request. If it didn't work, that was your problem. Now the approach has changed fundamentally. OpenAI describes a system that works iteratively: the model writes code, immediately runs it in a closed "sandbox," receives an error report from the compiler, and returns to the beginning to fix its own mistakes. This process repeats until tests pass successfully. In essence, the company automated the very process that any junior developer goes through in their first year of work, only doing it thousands of times faster.
Why does this matter right now? Let's be honest: lately, Claude 3.5 Sonnet from Anthropic has significantly dampened Sam Altman's team's mood, taking the lead in programming. Publishing technical details is not just a gesture of good faith, but an attempt to show that OpenAI still controls the architectural thinking. They are betting not on model size, but on the complexity of the system around it. It turned out that even a less powerful model with the right debugging tools can show better results than a "bare" supercomputer. This is a fundamental shift in the industry: we are moving from a race of parameters to a race of agent architectures.
It's interesting how OpenAI describes working with context. Agents now don't just read a single file—they can "look around" in the repository, understanding the connections between different parts of the project. This solves the main problem of AI coding—when fixing one line, the model breaks the entire application architecture.
Now the agent first builds a dependency map and only then reaches for the virtual keyboard. This approach makes it possible to solve tasks at the level of SWE-bench—the toughest test for AI, where you need to fix real bugs in open-source projects on GitHub. And judging by the numbers the company provides, we are on the threshold of a moment when AI will be able to close up to half of routine Jira tickets without human involvement.
Of course, there's no escaping irony here. While OpenAI teaches agents to write perfect code, developers around the world are starting to ask themselves: aren't they writing a tool for their own unemployment right now? However, full replacement of humans is still far away. The main problem is "logic hallucinations," when code is syntactically correct and even passes tests, but does something completely different from what the business asked for. OpenAI's agents haven't yet learned to argue against poorly written specifications, and that's our temporary salvation. However, the direction is clear: software development is transforming into a process of overseeing an army of autonomous agents, rather than writing lines manually.
The bottom line: the era of simple code prompts is ending, and the age of complex agent systems is beginning. Will OpenAI be able to maintain its lead in this segment, or will more flexible startups overtake it?
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.