Habr AI: how expensive LLMs became state managers and reduced development costs

Habr AI published a practical case study on why the popular "orchestrator + coders" pattern breaks down in real AI development. The team abandoned the idea that the strongest model should write code: instead, they turned the expensive LLM into the state manager and handed routine work to a low-cost worker. This helped reduce error cycles, stabilize context, and significantly lower API costs.

Khamidun Zhemal

AI monitoring · Habr AI

Apr 30, 2026· 3 min

AI-processed from Habr AI; edited by Hamidun News

Habr AI: how expensive LLMs became state managers and reduced development costs — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

On Habr AI, an analysis of AI development architecture was published, in which an expensive LLM no longer writes code itself, but instead manages a cheaper executor. The author argues that such restructuring helped eliminate endless error loops, reduce context size, and significantly lower API expenses.

Where Agents Broke

The team started with a popular path: they took Aider and integrated it into CI/CD so the agent could automatically update documentation after code changes. For this task, the tool worked well and addressed some technical debt. But when they tried to give it the full development cycle — from backlog and specifications to code and tests — the system quickly ran into limitations. There were two main problems: weak control over integrations and unclear artifact transfer between steps, when an agent created something but it was difficult to reliably extract and integrate into the next stage.

The next attempt was closer to a familiar organizational structure: one orchestrator sets the task, and several agent-coders execute it in parts. On paper, the schema looked logical, but in practice it produced two systemic failures. The first was a breakdown of responsibility: if one agent didn't complete part of a feature, and the next one already built its logic on top, the error started to cascade. The second was analytical paralysis. The models endlessly read the repository, double-checked files, and delayed actual changes while context bloated and the token bill grew.

Why Change Roles

During testing, the team noticed that different models indeed have a distinct work "character". Gemini 3 Pro acts like an overly confident developer and can deviate from the original specification. MiniMax M2.5, by contrast, is cautious and reads half the project before taking the first step. Claude Sonnet 4.6 showed the best balance between autonomy and discipline, but using it for every small action turned out to be too expensive for a startup.

This is where the new idea grew: a strong model should be assigned not to routine, but to control.

"The CEO doesn't make cold calls."

Instead of a schema where an expensive LLM writes the most complex code, the team introduced several strict rules:

One agent leads one specification from start to finish and fixes its own errors.
An agent works only with a limited "workbench" of 5-8 files, not the entire repository.
When closing a file, it saves a brief memory of useful findings to avoid dragging entire source code into context.
The smartest model doesn't code directly, but acts as a state manager for a cheap worker.

How the Manager Works

In the new architecture, a cheap and fast LLM acts as a worker: it writes code, calls tools, receives compilation errors, and makes routine passes. When the worker hits a problem or reaches its action limit, control is taken over by the expensive model — the state manager. It doesn't fix code directly, but reads the accumulated history, filters out noise, and assembles a compact, useful version of the context for the next step.

The state manager does four things in sequence:

Briefly records what has actually been done and what works.
Updates memory: variables, decisions, found library conflicts, and dead ends.
Checks whether it makes sense to continue, or if the task has hit tool limitations.
Formulates a clear directive for how the worker should move forward and bypass errors.

The most interesting technique is how these instructions are conveyed. The manager's recommendations, besides the memory block, are presented to the worker as a new user message. Because of this, the executor perceives the instructions as high-priority and objects less to them. In parallel, the system clears the worker's past conversation with logs and errors to start a new cycle with a "clean window".

There is risk in this approach: if the manager misinterprets the logs and writes a false fact into memory, the worker will stubbornly follow an erroneous course. But the author writes that in the analytical role, the expensive model hallucinates much less frequently than in direct code generation.

An additional effect — tests and documentation begin to appear alongside the task by default, and developers shift from the role of manual executors to the role of operators and process architects.

What It Means

This case demonstrates well that success in AI development comes not only from choosing a model, but also from correctly distributing roles between them. If you use an expensive LLM as a memory dispatcher, decision controller, and circuit breaker for meaningless loops, you can simultaneously increase process stability and reduce the cost of rework.

For teams that have already been burned by "autonomous programmers," this is one of the most practical architectural conclusions of recent months.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →