Habr AI→ original

Testing Pyramid as a Task Decomposition Tool for AI-Agents in QA Assist

The QA Assist system of 11 AI-agents faced a classic problem: a language model cannot cover an entire project in a single request due to context window…

AI-processed from Habr AI; edited by Hamidun News
Testing Pyramid as a Task Decomposition Tool for AI-Agents in QA Assist
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

When a language model becomes a test designer, classical QA theory unexpectedly acquires a new dimension. This is the topic of the third article by Mikhail Fedorov in his series about the QA Assist system, published on Habr. This time, the author explains why the testing pyramid, conceived long before the era of neural networks, turns out to be critically important for AI agents with limited context windows.

QA Assist is a system of 11 specialized AI agents designed to automate software testing. In the first article of the series, Fedorov described the architecture: how agents are divided by responsibility, how they interact, and what they can do. In the second, he honestly showed the reality of implementation: a task that looks like four hours of work on paper transforms into a week of approvals, meetings with security specialists, and infrastructure config fixes in a corporate environment.

The third article rises to a higher level—to the question of how to properly formulate tasks for AI to get a high-quality and reproducible result. The testing pyramid is one of the fundamental principles of software development. At the base are fast and inexpensive unit tests that verify functions and methods in isolation.

In the middle are integration tests that verify component interaction. At the top are slow and expensive end-to-end tests that simulate real user scenarios. The classic ratio: many units, fewer integration tests, minimal E2E.

This structure saves time on test execution and simplifies debugging: when a unit fails, it's immediately clear what broke.

The problem arises when a language model designs tests instead of an engineer. An LLM operates within a context window—a fixed volume of tokens that the model can maintain in a single generation session. For small tasks, this is not critical.

But if you ask a neural network to write a complete test suite for a large application in one request, the result becomes predictable: either part of the business logic will be lost beyond the context edge, or the model will produce shallow scenarios without diving into real dependencies and edge cases. This is where the testing pyramid stops being textbook theory and becomes a practical tool for task decomposition. The author's metaphor—feeding an elephant to a neural network piece by piece—accurately describes the essence of the approach.

A large task is broken down into layers according to the pyramid: first, agents generate unit tests at the function level, then move to integration scenarios, and finally—to E2E. Each layer fits within the model's context window and is processed in isolation, without loss of quality due to context overflow.

This approach provides several concrete advantages. Each request to the model becomes focused: the agent receives a clear scope, a defined input contract, and a specific output artifact. Errors are localized—if a unit test is written incorrectly, this is visible immediately, not after several iterations when an integration scenario is already being built on top of it.

Finally, the pyramid establishes a natural order of dependencies: E2E tests are built on top of a verified foundation, not in parallel with it. Fedorov does not claim to have invented the wheel. The author himself acknowledges: this is an application of a long-known engineering principle to a new context.

But therein lies the main idea: AI does not abolish basic development principles; it makes them even more significant. Understanding the testing pyramid is now needed not only by a QA engineer, but also by those who design the architecture of requests to language models. For teams considering AI tools for test automation, this is a practical lesson: first design the task decomposition, then trust it to the model.

An elephant is eaten piece by piece—and this is not a limitation of technology, but the only working architecture.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…