Habr AI showed how to prepare structured input for an AI agent instead of a raw technical spec
Habr AI published a useful breakdown of input data for an AI agent that checks technical specs. Instead of a whole document, the author suggests splitting…
AI-processed from Habr AI; edited by Hamidun News
Habr AI published a detailed breakdown of what exactly needs to be fed to an AI agent when the task is to verify technical requirements rather than simply paraphrase them. The main idea: instead of feeding the agent an entire specification document, it receives a set of atomic requirements in the form of JSON passports.
Why one specification isn't enough
The article's author starts with a problem familiar to almost everyone who has tried giving a neural network a large document in full. When a model receives a multi-page specification without prior preparation, it loses focus, mixes requirements from different sections, and provides overly general remarks. As a result, the system may notice individual inaccuracies, but it struggles to explain which specific point is problematic, why it's problematic, and what needs to be fixed.
That's why the document is first cut into separate requirements: one action, one rule, or one constraint per fragment. To avoid losing context during such fragmentation, each element is augmented with fields like `parent_section` and `parent_object`, and logically related points are marked as `linked`. This matters when several requirements must be checked together: for example, when the system must send a notification both via email and Telegram, not just through one channel.
Requirement passport
The next step is to convert human language into a set of features that a classifier can work with. In this scheme, the LLM doesn't serve as the final judge and doesn't attempt to "understand everything." Its role is much narrower: it extracts structured signals from the text and collects them in JSON. This approach provides control: features can be checked, compared, and corrected in post-processing if necessary. As the author puts it:
"the agent works not with text, but with such structures."
The article describes six basic features that form the foundation of this passport. Instead of an abstract quality assessment, the model looks for specific signals: numbers, vague words, exceptions, boundaries, and explicit scenario participants. In practice, such a passport transforms a phrase like "the user should flexibly configure the report" into an understandable set of flags that immediately show what the requirement is missing. This interpretability is precisely what distinguishes the scheme from simply asking a model to evaluate text as a whole.
- `has_numbers` — whether the requirement contains numbers, limits, dates, and other specific parameters
- `stopword_score` — how vague the formulation is due to words like "flexible," "convenient," or "fast"
- `has_negative_keywords` — whether exceptions and errors are described
- `boundary_conditions_mentioned` — whether empty values, maximums, minimums, or other boundaries are specified
- `actor_count` — how many participants are explicitly mentioned in the requirement
The features themselves are extracted through JSON mode and few-shot examples to keep the model from deviating from the format. If the LLM still misses something obvious, such as numbers in the text, this is enforced in post-processing through regular expressions. Next comes a decision tree: it receives numerical features and assigns the requirement a label such as `ok`, `unverifiable`, `no_negative`, `no_boundary`, or `ambiguity`. For training, the author labeled 90 specifications, split them into 270 requirements, and achieved approximately 82% accuracy on the test set.
Critic and scale
The pipeline doesn't end there. Even a good classifier sees only one requirement at a time, which means it can easily miss contradictions between sections. For such cases, a separate critic agent is used, which receives the full specification text, the list of JSON passports, and the predicted labels.
Its task is not to re-evaluate each phrase from scratch, but to view the document from above and search for conflicts, gaps in access rights, and errors in integration mapping. Such a critic might, for example, notice that in one place the "Warehouse" field is mandatory, while in another an empty value is allowed. To make the scheme work not just on short examples, requirements are processed in parallel through `ThreadPoolExecutor`, and local models are run in Ollama.
The author notes that on a typical gaming PC, the system comfortably handles 4–6 parallel requests without noticeable degradation, and on a batch of a hundred requirements, this provides an acceleration of roughly 3–4 times. Related requirements remain in a single thread to maintain order and overall checking context.
What it means
The breakdown on Habr AI clearly shows where practical AI agent development is heading: from attempts to "feed the model everything at once" to narrow, controlled pipelines with explicit features, local models, and a separate arbitration layer. If a team wants to build a practical agent for analytics, QA, or documentation work, it will need to think not only about choosing a model, but also about how input data, labeling, and final result verification are structured.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.