OTUS explained on Habr how AI agents for software development work: from tokens to tools

Q: What is the source?

Originally published on Habr AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

May 2, 2026. Reading time: 3 min.

Habr published a useful breakdown of how AI agents for software development actually work. Behind the magic is ordinary engineering: an LLM, a system prompt…

Hamidun News Editorial

AI monitoring · Habr AI

May 2, 2026· 2 min

AI-processed from Habr AI; edited by Hamidun News

OTUS explained on Habr how AI agents for software development work: from tokens to tools — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

Habr published a detailed breakdown of how AI development agents work. The text dispels the aura of "magic" and shows that behind the convenient interface there are quite specific mechanics: tokens, system prompt, tools, dialogue history, and a cycle of repeated model calls.

Basic Agent Architecture

The article's main idea is simple: a development agent is not a separate type of intelligence, but a wrapper around a large language model. Inside such a system there is the LLM itself, a hidden system prompt with rules of behavior, a list of available tools, and code that runs all of this in a "request → function call → result → new request" cycle. This framework is what turns a model that can continue text into an assistant that writes code, reads files, runs commands, and returns intermediate results.

The basic mechanics of LLM are also discussed separately. A model works not with words, but with tokens — numerical representations of text and images. This is important not only for understanding the architecture, but also for product economics: providers charge money for processed input and output tokens, and also limit the total context size. So even a simple-looking user phrase is part of a chain where each new operation affects the price, latency, and quality of the response.

Context and Price

The article does a good job explaining why a long chat with an agent almost always becomes more expensive. A language model has no inherent memory between requests, so the application is forced to resend the conversation history to it with each subsequent turn. If a user asks to first write a function, then rewrite it for a different library, and then add tests, the entire previous dialogue goes back into the model as input. As the session grows, the cost of each subsequent step increases.

length of system prompt
volume of chat history
number of input and output tokens
caching of repeated prefixes
number of intermediate function calls

Against this backdrop, token caching becomes especially important. If the early part of the prompt doesn't change, the model provider can process it more cheaply because some computations were already done before. This is why good agent systems try to carefully conduct a dialogue, not break stable chunks of context, and not reassemble the request without necessity. Otherwise, an agent can work noticeably more expensively without any real gain in results or speed.

Tools and Reasoning

The key difference between an agent and a regular chat is access to tools. The model receives instructions about which functions it is allowed to call: from reading files and searching code to running Bash or Python. Next, the agent wrapper extracts such a call from the model's response, executes it, and returns the result back into the context. It is through this cycle that an agent can not just "advise," but actually test hypotheses, look at project contents, reproduce errors, and fix code based on facts rather than guesses.

Another layer is the reasoning mode, which gives the model more time and tokens for intermediate analysis of the task. In the article it is described as one of the most noticeable shifts in recent generations of models, especially useful for debugging and analyzing complex execution branches. But the price for this advantage is direct: more computation, higher latency, higher cost.

As stated in the material, an agent is not magic, but a set of architectural decisions. And the quality of such an agent is determined not by one impressive model, but by how the engineer assembled the entire circuit together.

What This Means

The material is useful as an antidote to inflated expectations. If you are using or building an AI agent for development, you should look not only at the model name, but at context window, system prompt, set of tools, loop logic, and the cost of each step — that's where the real limitations and real quality are hidden.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation