Habr AI described the architecture of a reflective agent: less autopilot, more self-checking
Habr AI published a breakdown of a reflective AI agent — an architecture where the model does not just act, but pauses to check, draft, and confirm. At the…
AI-processed from Habr AI; edited by Hamidun News
Habr AI has published a breakdown of the architecture of a reflecting AI agent — a system in which the model doesn't simply invoke tools, but goes through a mandatory verification cycle before each important action. The main idea is straightforward: in tasks where the cost of error is high, it's more useful for an agent to be not the fastest, but the most predictable.
Why speed alone is not enough
A typical agent operates on the think → do scheme: quickly grasps the task, immediately accesses files, CRM, email, or terminal. In a demo this looks impressive, but in a real environment this mode quickly hits system failures. One hallucination turns into an action, one inaccuracy drags along a cascade of errors, and a misunderstood goal forces the model to ignore side effects. In the end, the system looks confident right up to the first contact with incomplete, noisy, or contradictory data.
The article's author proposes looking at the problem not as a deficit of the model's "intelligence," but as a deficit of architecture. Even a strong LLM makes mistakes if it lacks a built-in pause for double-checking, clear access boundaries, and a rollback mechanism. Therefore, we're talking not about a new prompt, but about a new execution loop that forcibly inserts reflection between perception, decision, and action. This is precisely why speed alone is no longer considered a sufficient advantage.
How the cycle works
Instead of a direct "understood — did it" link, Habr AI describes a seven-step cycle in which the agent at each iteration gathers fresh context, builds a plan, forms a draft action, checks itself, and only then commits. If data is insufficient, it can pause, ask a clarifying question, or wait for a human response without losing session state. Such an approach makes the agent mode closer not to autopilot, but to a careful assistant who knows how to put a task on pause.
Several key nodes stand out in the architecture:
- Dynamic context — before each step, the agent anew gathers available objects, tools, constraints, and session history.
- Draft changes — any edits first live in a temporary layer, not going straight to production.
- Reflection phase — before completion, the agent must check whether it missed steps, violated format, or has contradictions.
- Confirmation gate — risky operations are halted until explicit human agreement.
- Commit and rollback — after approval, changes are applied atomically, and on failure, state can be restored from a snapshot.
Separately important is the idea of a universal tool protocol. Through a single interface, such an agent can be connected to a file system, terminal, databases, CRM, browsers, payment services, or industry-specific reference materials. The logic doesn't change: first gather context, then plan, then check, and only then act. Through this, the same scheme transfers from development to law, medicine, analytics, and support without a complete recompile of the core.
Where the safeguards are
In the article, safety is positioned not in a "filter at the output," but inside the execution cycle itself. All actions are divided by risk level: safe reads pass automatically, changes create drafts, and destructive operations require separate confirmation. This is important for publications, mass mailings, data deletion, financial transactions, and any steps that can't simply be undone with a back button.
The human in such a scheme remains not an observer, but a holder of final veto power.
On top of this layer, technical safeguards operate: Scope Jail doesn't let you go beyond permitted resources, a loop detector stops repetitive actions, iteration and token limits cut off runaway scenarios, and snapshots allow you to roll back a session to its previous state. Even if the model suggests a dangerous move, the final decision remains not with the text, but with the execution layer, which checks every tool call. This reduces the cost of error and makes the agent loop more suitable for production.
"The smartest agent is not one that does everything itself"
This scheme fits well in domains where error is costly: law, medicine, finance, support, marketing, infrastructure. Everywhere the same pattern persists: first hypothesis, then verification against rules, then action within allowance. An agent can prepare output, a draft answer, a work plan, or a set of edits, but the final step always remains conscious and verifiable.
This is precisely what distinguishes an assistant from an unconditional executor.
What this means
The architecture of a reflecting agent is an attempt to translate AI from impressive-demo mode to working-tool mode. For the market, this is an important signal: the winners will be not only the "smartest" models, but also those systems where there is a pause for self-checking, a transparent action log, rollback, and human-in-the-loop. It is precisely such agents that have a chance to work normally in production, and not just impress with speed on presentations.
This is no longer magic in the interface, but an engineering approach to autonomy.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.