SENAR introduces quality gates for AI development: how specifications and metrics reduce errors
The fourth part of the SENAR series on AI-agent development methodology was published on Habr. Andrey Yumashev explains why agents cannot be given a task…
AI-processed from Habr AI; edited by Hamidun News
A fourth article in the SENAR cycle was published on Habr — an open methodology for development with AI-agents. Andrey Yumashev describes how formal input and output "gates" should replace personal discipline of task setters and reduce the number of errors that surface only after a task is closed.
How SENAR Works
SENAR is what the author calls an engineering methodology for working with AI-agents in development. It grew not from theory, but from practice: according to Yumashev, over one and a half years, more than thirty projects have passed through such a regime, where code was increasingly written by an agent, while humans handled task specification, acceptance, and failure analysis. The main idea of the article is simple: an agent does not maintain context between runs, literally follows the wording, and easily optimizes locally if a task is described carelessly.
Within a single task, SENAR relies on several mandatory elements:
- formal task objective in product logic
- verifiable acceptance criteria
- a separate block of negative scenarios
- boundaries of changes and architectural context
- signal metrics for process quality
The author emphasizes that this is not an attempt to replace tests, linters, or code review. The logic is different: normal checks examine the code, while gates examine the task itself before start and the quality of its acceptance after completion. In the practical implementation of TAUSIK, these steps are built directly into the tool, so they cannot be skipped without circumventing the system itself. This, according to the author's thinking, protects the team from "Friday" fatigue, when the smallest tasks most often slip into production with defects.
What the Gates Check
On input, SENAR uses the QG-0 gate. It does not allow a task to begin work until it has a minimum specification: an objective, acceptance criteria, negative scenarios, boundaries of changes, and a link to architectural context. Yumashev separately argues with the popular assumption that small tasks can be handed to an agent "in one line." It is precisely such tasks, according to his observation, that most often break in production, because the task setter keeps important details in their head but does not fix them in the ticket.
"The step was skipped not by the agent, but by me."
On output, QG-2 works — a gate that blocks task closure until the result is verified against the promises made on input. In the article, the author highlights three mandatory checks: confirmation of each acceptance criterion by test, manual check, or artifact; fixing of all manual corrections after the agent's work; updating of project memory if the task uncovered a new edge case or infrastructure peculiarity. Such a mode is needed not for the sake of bureaucracy, but so that the agent in the next task does not repeat the same errors due to silent corrections made by a human.
Metrics and Limits
A separate section of the article is devoted to metrics that SENAR uses as signals of process state. FPSR shows the proportion of tasks solved on the first attempt; MIR — how often manual correction was needed after the agent; DER measures dead-end branches and time losses; ERR reflects tasks that had to be fixed only after closure.
According to the author's work log, on server tasks in a familiar domain, FPSR grew roughly from 40% to 75–80%; MIR on the Sortule project declined from 20% to 5–7%, and ERR dropped to approximately 6% from 15%. At the same time, Yumashev honestly describes the limits of the methodology. The gates poorly help where the result is difficult to formalize: in tasks on interface "feel," text tone, or product intuition. They do not help either when working with external services, if third-party documentation contradicts the real behavior of the API. In such cases, the formal process can maintain task structure, but does not replace domain knowledge, manual hypothesis testing, and preliminary integration research.
What This Means
SENAR is formalized not as a set of recommendations, but as a rigid operational loop for AI development: without a proper specification, the agent does not start; without confirmed acceptance, the task does not close. For teams that are already handing code to agents, this is a strong signal: the main risk now lies not only in the model, but in the quality of task specification, project memory, and process discipline.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.