Habr AI→ original

An open-source stack of 6 models and 9 agents showed how to build an AI team on a single server

One GPU server, six open-source models, and nine agents — that is what an autonomous AI team looks like when it designs, writes, tests, and deploys new…

AI-processed from Habr AI; edited by Hamidun News
An open-source stack of 6 models and 9 agents showed how to build an AI team on a single server
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

An autonomous team of nine AI agents can design, write, test, and deploy new agents without human involvement. This doesn't require a set of closed APIs: the scheme is built on six open-source models and in the basic version fits on a single GPU server.

How the team is structured

Instead of one "universal" model, the author assembled a pipeline of nine roles. Some agents handle task formulation and architecture, others handle code writing, quality checking, testing, and deployment. The result is not one large assistant, but a small engineering organization where each participant does a narrow piece of work. This approach reduces chaos: an agent doesn't need to simultaneously plan the system, write modules, run tests, and evaluate its own mistakes.

The key idea is that autonomy is achieved not by magic, but by breaking the process into stages. If an agent understands only its own part and receives an already structured task as input, the requirements for the model become clearer. The orchestrator must be able to reason and maintain context, the builder must stably generate code, the critic must see problems in tool use and execution scenarios. This is why the win comes not from one super-model, but from precise assembly of roles into a working pipeline.

Roles and benchmarks

The author directly rejects the idea of a "best model overall." Instead, roles are selected based on what benchmarks confirm. For the orchestrator, reasoning is important, so the benchmark is GPQA at 88.4%. For the builder, code generation is critical, and HumanEval at 92.7% is used here. For the critic, understanding tool use and agent behavior in tasks is more important, so tau-bench at 87.4% is used.

It's precisely because of this specialization that instead of one GPT-class model for all cases, six different open-source models are used.

  • Orchestrator — strong reasoning, prioritization, and task decomposition
  • Builder — code generation and rapid engineering changes
  • Critic — tool use verification, solution quality, and pipeline weaknesses
  • Other roles — tests, deployment, and auxiliary stages where common instances can be reused

At the same time, nine agents don't mean nine full models in memory simultaneously. One practical trick is instance sharing: several roles share the same model if their loads and task profiles are similar. As a result, a system of nine agents can run on just three or four model instances. This drastically reduces VRAM consumption, simplifies maintenance, and makes the architecture closer to real production rather than a demonstration on an unlimited budget.

Hardware and launch

Separately interesting is the infrastructure part. The author describes three deployment configurations: from a single RTX 4090 with 24 GB of memory to an A100 cluster with a total of 211 GB. Between these extremes, you can choose a balance between speed, quality, and parallelism. To reduce costs, quantization, well-designed inference infrastructure, and an interactive dashboard that helps track roles, load, and task progress are used.

That is, it's not just about selecting models, but also about a proper operating environment for them. The practical conclusion is simple: open-source agency stops being a toy for the lab. If such schemes were previously associated either with expensive APIs or heavy clusters, here a more grounded path to startup is shown. A small team can start with a single server, verify pipeline viability, and then scale it as tasks grow. The cost of the issue already looks like an engineering choice, not a barrier that immediately filters out most teams.

What it means

The market is moving from the idea of one "magical" model to role-oriented systems, where correct composition matters more than a loud API name. For business, this is a signal: autonomous AI teams can be assembled from open-source components right now, if you treat them as infrastructure and process, not as a single chat window.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…