Habr AI→ original

Spec Kit in Real Full-Stack: How 17 Sprints Shipped a Habit Tracker to Production

Spec-Driven Development was tested not on an MVP, but on a complete full-stack development cycle: 17 sprints, two repositories, 328 tests, and LifeSync's…

AI-processed from Habr AI; edited by Hamidun News
Spec Kit in Real Full-Stack: How 17 Sprints Shipped a Habit Tracker to Production
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Spec-Driven Development managed to deliver not just a one-evening demo, but a full-fledged full-stack release. The LifeSync case describes 17 sprints of habit and goals tracker development, two separate repositories, hundreds of tests, and a production launch — and leads to the main conclusion: at scale, SDD is valuable not for code generation speed, but for preventing loss of control when backend, frontend, and infrastructure begin pulling the project in different directions. The case study uses a B2C service for habits and goals as an example.

The backend is built on Spring Boot 3.5 and Java 21, organized using hexagonal architecture across six Maven modules, uses jOOQ instead of JPA, Kafka for event-driven logic, JWT RS256, and OpenAPI 3.1 as the primary API contract.

The frontend is built on React 19, TypeScript 5.9, Vite 8, TanStack React Query, Zustand, and shadcn/ui with Tailwind CSS. Over 17 sprints, the project grew to 251 server-side tests and 77 client-side tests, received 19 Liquibase migrations, and was launched to production: backend deployed on Railway, frontend on Vercel.

The central insight of the case is that scale demanded a different discipline for working with AI. If a small project could get by with one thinking chat and one execution environment, the author came to a three-chat scheme for the full-stack project. A separate context is needed for backend, a separate one for frontend, and another for coordinating common decisions that affect both repositories: deployment, cross-cutting features, API changes, and retrospectives.

This is complemented by two different constitution files. For backend, the document grew over time from a template to 437 lines and incorporated rules on migrations, commits, style, and OpenAPI. For frontend, the constitution was much more compact — 137 lines, because the stack and patterns were more strictly defined from the start.

Special emphasis is placed on the speckit.analyze command, which the author uses as a mandatory feedback loop after each sprint. It does not replace tests and does not pretend to be a linter, but reconciles between spec.

md, plan.md, tasks.md, and actual implementation.

According to the author, exactly this kind of analysis helped catch a critical issue in the backend Kafka sprint: event publication across multiple use cases was happening inside a transaction without a protective try-catch. In the test environment, the error did not manifest because Kafka in containers was always available, but in a real failure it could have rolled back already saved data and left the system in an inconsistent state. After analysis, protection was added to the code, retry logic was improved, and integration checks were expanded.

Another important pillar is the API-first approach as a contract between the two repositories. In the project, a separate module is allocated for this with an OpenAPI specification of approximately 2,669 lines and 32 endpoints. The backend generates Java controller interfaces from it, and the frontend uses the same YAML as the single source of truth and manually maintains TypeScript types for real UI scenarios.

The author specifically notes that speckit.analyze cannot automatically validate consistency between two repositories, so cross-repository checks still have to be done manually through the coordination context. But even in this form, the scheme reduces the risk of silent desynchronization between server and interface: if the API changes, it is first recorded in the specification, and then appears at the stage of interface generation or frontend compilation.

The practical conclusion from this case is quite strict: SDD scales, but only if you treat it as a change management system, not as a way to ask AI to write a feature. On short tasks you might not notice the difference, but over the course of dozens of sprints, the boring elements of the process begin to pay off — living constitutions, separate contexts, formal artifacts, and regular checks of their consistency. The case is interesting because it shows SDD not on a toy MVP, but on a project that already has production, a test loop, separate repositories, and real cost to architectural and contract errors.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…