Google AI Ultra: how to turn a subscription into a pool of parallel agents and model consensus

Q: What is the source?

Originally published on Habr AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Apr 30, 2026. Reading time: 3 min.

Google AI Ultra can be used not only as access to Gemini, but also as a foundation for multi-agent development. In a setup with a pool of background workers…

Hamidun News Editorial

AI monitoring · Habr AI

Apr 30, 2026· 2 min

AI-processed from Habr AI; edited by Hamidun News

Google AI Ultra: how to turn a subscription into a pool of parallel agents and model consensus — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

Google AI Ultra can be viewed not as an expensive Gemini subscription, but as the foundation for your own multi-agent environment. The idea is to distribute routine tasks among background workers, while using the main agent as an orchestrator and solution reviewer.

Subscription as Infrastructure

The main thesis of this analysis is simple: Ultra's price looks high, but it can be justified not by the number of chats, but by work architecture. Instead of pushing one model through endless refactoring, the proposal is a combination of Antigravity IDE, Claude as the primary agent, and Gemini CLI as an almost unlimited executor pool. If one model's daily limit runs out, the scenario doesn't collapse: the orchestrator switches to another model, and background tasks continue separately.

This approach solves two problems at once. The first is token cost in classic API scenarios, where every auxiliary agent consumes budget. The second is the bottleneck of a single IDE session: even if the environment can call sub-agents, it's hard to manage, assign roles, and build repeatable processes.

Here it's presented as a transition from one smart window to a full team of agents, where some handle research, some handle code, and some verify the proposed solution.

How the Pool Works

For this, a custom MCP-server called Agent-Pool-MCP is proposed. It works on a pull model: the main session doesn't wait for the background executor to finish a task, it just gets a task_id and moves on. Before a complex change, you can first send research in read-only mode, then ask a second model separately via consult_peer, and only then launch the refactoring. Instead of a linear scenario—think, do, check—you get a pipeline where different stages run in parallel and don't block each other.

the main IDE agent sets tasks and collects results
background workers on Gemini CLI perform analysis, code, and checks
consult_peer gives a second opinion from another model before code changes
shared directory `.agent/delegation/` routes artifacts between agents
skills and workflows define roles, checklists, and standard pipelines

The key rule here is extremely strict: agents must not edit the same files simultaneously. One writes conclusions to a separate markdown file, another reads them and formulates an architectural proposal, a third audits templates. This eliminates most collisions and turns the agent system into something like a development team with separate lanes of responsibility. The principle is stated as directly as possible:

No one touches each other's files.

Consensus and Control

The most interesting part is cross-model consensus. Since the worker pool is built on Gemini CLI, another model takes the role of the main IDE agent to get not self-agreement, but real external validation of the idea. In the example, Claude proposes an architectural solution, and Gemini searches for blind spots and returns a verdict like AGREE or SUGGEST_CHANGES before anyone changes the codebase. On top of this sits fractal orchestration, where the orchestrator can spawn new teams and nested sub-teams almost like engineering managers and developers in a regular structure.

The behavior of agents in waiting is discussed separately. Without additional tuning, they either meaninglessly poll the status of a background task, or rush to do it themselves and duplicate work. For this, a hint called on_wait_hint is added to the system: it tells the model when to switch to another useful task, and when, conversely, to just wait for the result. The final emphasis is security: any MCP server should be considered a potential prompt injection point; lock versions, restrict file system access, and don't pass secrets in prompts.

What This Means

This analysis nicely illustrates a new shift in the AI tools market: the value of a subscription is now measured not only by model quality, but also by how easily you can build a working agent infrastructure from it. If this approach takes hold, expensive AI plans will sell not more answers, but a full environment for parallel development, research, and internal AI review.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation