Habr details an AI framework for Claude with Clean Architecture and a TDD cycle
Habr published an analysis of an AI framework that makes Claude write code using stories, progress files, and TDD cycles. The author says that over 3.5…
AI-processed from Habr AI; edited by Hamidun News
On Habr, a detailed breakdown of an AI framework was published that forces LLM to write code not in a free "vibe-coding" mode, but according to a strict engineering process. The author claims he's already been conducting almost all his development through this approach and relies on Clean Architecture, TDD, and mandatory human code review.
How the flow is structured
The approach is based on a simple idea: good software is not a set of files, but a set of confirmed behavioral scenarios. That's why work doesn't start with code generation, but with breaking down the product into independent stories, each of which provides separate user value. For each story, Claude first goes through a specification stage: conducts interviews, forms descriptions, acceptance criteria, designs APIs, creates mockups, and only then compiles a list of test cases. Only after this does implementation begin.
The author writes that he tested this process in practice over 3.5 months: approximately 4,000 commits went through it, 1,500 tests, about 350 e2e checks, and about 25,000 lines of production code, not counting the test layer. The central element of the scheme is the /continue command. It looks at the list of stories and the progress.md file, determines at what stage development stopped, and moves the task forward without manual selection of the next action.
TDD and gates
The framework doesn't just ask the model to "write a feature," but literally guides it through the ATDD cycle and nested TDD steps. For the backend, this begins with a red acceptance test, then Claude descends lower—to the use case and adapters, after which it gradually "greens" each level. The logic is the same for other parts of the product: first the verifiable behavior is fixed, then the code is written, not the other way around. This is how the author tries to tie the model's work to architecture rather than random luck.
- interviews and formalization of story requirements
- generation of test plan and cases before implementation starts
- red-green-refactor cycle for use cases, adapters, and acceptance tests
- quality gates with checks for test rigor, architecture, and coverage
- commit after each step and mandatory pause for human review
"Bare prompts don't get you far."
On top of TDD, the author added safety quality gates. At the red step, the model separately checks how strict the tests are and whether they break the test architecture; at the green step, it refactors the code and verifies coverage. Each gate has a checklist that the LLM must explicitly go through. Special emphasis is placed on context management: each substep runs in a separate agent, and progress.md allows resetting the context after each commit and loading only the minimum data for the next pass.
Parallel IDE windows
The practical side of this approach is no less important than the architecture itself. One /continue run can take 5 to 20 minutes, and on heavy tests—up to 40 minutes or even an hour. To avoid waiting in one thread, the author suggests cloning the demo repository several times or using worktrees and running different stories in parallel in separate IDEs.
In his own process, up to six windows are open simultaneously, with four or five occupied by agent work, while the remaining ones are used for review, refactoring, and technical debt. For non-standard cases, there are additional commands alongside the main flow. The author published two repos for entry: a demo Kanban on Java + React and an empty framework scaffold.
/task is needed for bug fixes, infrastructure, and long tasks that don't fit into a story with full TDD ritual. /prompt-update is for improving the framework itself if the model stumbles over a recurring problem again. However, the author directly acknowledges the limitations: the solution grew out of a specific web stack, is tailored to Clean Architecture and ATDD, consumes many tokens, and doesn't make development fully autonomous. If a bug isn't caught in review, it will calmly make it to production.
What it means
The approach shows where mature AI development is shifting: from single prompts to reproducible pipelines, where the model works inside the process rather than instead of it. The main conclusion here is quite strict: LLM can already accelerate production development, but only if it's kept within the bounds of tests, checklists, short steps, and constant human control.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.