Habr AI→ original

OTUS explained on Habr how AI agents for software development work: from tokens to tools

Habr published a useful breakdown of how AI agents for software development actually work. Behind the magic is ordinary engineering: an LLM, a system prompt…

AI-processed from Habr AI; edited by Hamidun News
OTUS explained on Habr how AI agents for software development work: from tokens to tools
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Habr published a detailed breakdown of how AI development agents work. The text dispels the aura of "magic" and shows that behind the convenient interface there are quite specific mechanics: tokens, system prompt, tools, dialogue history, and a cycle of repeated model calls.

Basic Agent Architecture

The article's main idea is simple: a development agent is not a separate type of intelligence, but a wrapper around a large language model. Inside such a system there is the LLM itself, a hidden system prompt with rules of behavior, a list of available tools, and code that runs all of this in a "request → function call → result → new request" cycle. This framework is what turns a model that can continue text into an assistant that writes code, reads files, runs commands, and returns intermediate results.

The basic mechanics of LLM are also discussed separately. A model works not with words, but with tokens — numerical representations of text and images. This is important not only for understanding the architecture, but also for product economics: providers charge money for processed input and output tokens, and also limit the total context size. So even a simple-looking user phrase is part of a chain where each new operation affects the price, latency, and quality of the response.

Context and Price

The article does a good job explaining why a long chat with an agent almost always becomes more expensive. A language model has no inherent memory between requests, so the application is forced to resend the conversation history to it with each subsequent turn. If a user asks to first write a function, then rewrite it for a different library, and then add tests, the entire previous dialogue goes back into the model as input. As the session grows, the cost of each subsequent step increases.

  • length of system prompt
  • volume of chat history
  • number of input and output tokens
  • caching of repeated prefixes
  • number of intermediate function calls

Against this backdrop, token caching becomes especially important. If the early part of the prompt doesn't change, the model provider can process it more cheaply because some computations were already done before. This is why good agent systems try to carefully conduct a dialogue, not break stable chunks of context, and not reassemble the request without necessity. Otherwise, an agent can work noticeably more expensively without any real gain in results or speed.

Tools and Reasoning

The key difference between an agent and a regular chat is access to tools. The model receives instructions about which functions it is allowed to call: from reading files and searching code to running Bash or Python. Next, the agent wrapper extracts such a call from the model's response, executes it, and returns the result back into the context. It is through this cycle that an agent can not just "advise," but actually test hypotheses, look at project contents, reproduce errors, and fix code based on facts rather than guesses.

Another layer is the reasoning mode, which gives the model more time and tokens for intermediate analysis of the task. In the article it is described as one of the most noticeable shifts in recent generations of models, especially useful for debugging and analyzing complex execution branches. But the price for this advantage is direct: more computation, higher latency, higher cost.

As stated in the material, an agent is not magic, but a set of architectural decisions. And the quality of such an agent is determined not by one impressive model, but by how the engineer assembled the entire circuit together.

What This Means

The material is useful as an antidote to inflated expectations. If you are using or building an AI agent for development, you should look not only at the model name, but at context window, system prompt, set of tools, loop logic, and the cost of each step — that's where the real limitations and real quality are hidden.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…