How AI agents for writing code work and what matters to know about them
Ars Technica has published a detailed breakdown of how AI agents for coding work — from Cursor and GitHub Copilot to Devin and Claude Code. Key mechanisms inclu
AI-processed from Ars Technica; edited by Hamidun News
The 2025 programmer increasingly rarely writes code from scratch. He formulates a task, launches an AI-agent and watches as it generates dozens of files, refactors the architecture and even runs tests. Tools like Cursor, GitHub Copilot Workspace, Devin and Claude Code have transformed from exotic experiments into everyday reality for hundreds of thousands of developers over the past year. But few users truly understand what happens under the hood. Ars Technica has published a detailed breakdown of the internal mechanics of AI code-writing agents, and the conclusions deserve the attention of anyone who trusts the machine with at least part of their work.
The foundation of any coding agent is a large language model, whether it's GPT-4o, Claude 3.5 Sonnet or Gemini. But the model itself is only an engine.
An agent is an entire engineering framework around it. The first and perhaps most critical problem that agent developers solve is the limitation of the context window. Even the most advanced models have a finite one: 128 or 200 thousand tokens sounds impressive, but a real software project can contain millions of lines of code.
Agents handle this through a set of tricks: they index the codebase, build semantic maps of the repository, extract only relevant fragments and compress the context, discarding what the model has already "processed". Essentially, the agent serves as an intelligent librarian who brings the model exactly the books it needs to answer a specific question, rather than dumping the entire library on the table.
The second key mechanism is chains of thought, or chain-of-thought. Instead of generating an answer all at once, the agent breaks down a complex task into sequential steps. First it analyzes the project structure, then formulates a plan for changes, then implements each step, checks the result and adjusts course if necessary.
This is not just a stylistic device — research shows that step-by-step reasoning radically reduces the number of errors when solving complex problems. Some agents go further and use what's called multi-agent architecture: one model acts as an "architect" and decomposes the task, another writes code, a third handles review, a fourth handles testing. They communicate with each other through structured prompts, imitating the work of a real development team.
This is how Devin from Cognition and a number of other advanced systems are built.
But behind impressive demonstrations lie serious limitations that are worth remembering. The main one is hallucinations. An agent can confidently use a non-existent API, call functions with incorrect arguments, or create code that looks flawless but contains subtle logical errors. The problem is compounded by the fact that agents operate autonomously: if a classic autocompleter like early Copilot suggested one line that the developer immediately evaluated, then a modern agent can generate hundreds of lines across a dozen files. Checking such a volume manually is a non-trivial task, and the temptation to simply click "accept all" is great.
A separate headache is security. Research has already documented cases where AI-agents introduced vulnerabilities into code: from SQL injections to unsafe handling of user input. The model optimizes code for "it works", not "it's secure", and without explicit security requirements it may choose the simplest, but insecure path. For teams working with sensitive data or financial systems, this is not an abstract risk, but a concrete threat.
There is also a more subtle effect that the industry is only beginning to realize. When a developer relies on an agent to write a significant portion of the code, he gradually loses a deep understanding of his own codebase. Code written by a machine is often stylistically alien to the team, uses unfamiliar patterns and is harder to debug. A paradox emerges: a tool designed to accelerate development can slow down product maintenance in the long term.
All of this does not mean you should abandon AI-agents. They truly do speed up routine tasks, help you learn unfamiliar frameworks and lower the barrier to entry for programming. But you should treat them like a very capable, but inexperienced intern: he can do much, but each of his results requires review by a senior colleague. Understanding exactly how the agent makes decisions — how it compresses context, how it breaks down tasks, where its blind spots are — turns from academic curiosity into a practical skill. In a world where AI-agents write more and more code, literacy in their mechanics becomes as fundamental a competency for developers as knowing Git or being able to read logs.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.