Habr AI→ original

Vercel unveiled agent-browser for AI agents — lightweight browser access without MCP

Vercel unveiled agent-browser — a CLI tool for AI agents that removes noise from browser automation. Instead of massive DOM or accessibility tree, the agent…

AI-processed from Habr AI; edited by Hamidun News
Vercel unveiled agent-browser for AI agents — lightweight browser access without MCP
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Vercel released agent-browser — a tool that gives AI agents access to a browser without bulky MCP layers. The idea is simple: show models not the entire page's DOM, but only a short list of interactive elements that can be immediately worked with.

Why MCP is Struggling

Playwright and Puppeteer aren't going anywhere: they are powerful tools for e2e tests, CI/CD, and predictable parsing. Problems begin the moment a browser is handed over to LLM control via MCP. For a model to understand where to click, it needs to see the page.

Usually, either raw HTML or an accessibility tree is sent to the context. On modern SPAs, this quickly turns into extra thousands of tokens at each step and consumes agent memory before it even reaches its goal. According to data cited by the analysis author, a single click and snapshot of a complex page can cost between 15 to 200 thousand tokens per step.

This is not only expensive but also unstable: the model spends context reading the page tree, starts getting confused in CSS selectors, and more often misses the needed buttons. For deterministic scenarios, this approach is still tolerable, but for an autonomous agent that needs to quickly navigate the web, it's too heavy.

What Vercel Did

Vercel's task was practical: if an agent writes the interface itself, it should be able to open a page, check a component, and perform basic browser actions. To this end, the team simplified agent-browser and removed the previous Node daemon connection. The current version is built as a lightweight CLI in Rust that works directly with Chrome DevTools Protocol. As a result, the tool is simpler to run locally, more convenient to put in containers, and doesn't require additional Node infrastructure.

  • Single Rust binary
  • Direct CDP communication without extra layers
  • Zero dependencies for Docker and local environments
  • Short references instead of full DOM

The key idea is a snapshot of interactive elements. Instead of a giant tree, the agent gets a compact list like button "Sign In" [ref=e1] or textbox "Email" [ref=e2], and then works with short commands: open page, click @e1, fill @e2. This format takes not tens of thousands but hundreds of tokens. For LLM this noticeably reduces the load and simultaneously decreases the chance that an action breaks due to a fragile selector.

New Interface for Agents

The difference is clearly visible in a simple scenario: open a website and click on the first article. In the classic MCP scheme, the agent first receives a huge accessibility tree, then searches for the needed heading in it, and tries to assemble an accurate CSS selector. Any change in layout, a cookie modal, or an extra container makes such a click fragile. In agent-browser the route is shorter: open, then snapshot, then click by short ref. The model relies not on guesses about DOM structure, but on a pre-prepared map of interactive elements.

"Don't use MCP for the browser — save your context windows and API money."

It's telling that Microsoft is already pushing a similar idea with @playwright/cli. There, the agent also works through short commands, and browser state is stored outside the model's context. This is an important shift for the entire agentic tooling category: the market is moving away from the idea of streaming browser internals directly into the LLM and transitioning to a scheme where a local tool maintains state itself, and the model is given only the minimally necessary control layer. The difference between solutions is now rather in the ecosystem: Playwright remains heavier, Vercel's Rust approach is more minimalist.

What This Means

Browser automation for AI agents is beginning to split into two classes. Classic Playwright and Puppeteer remain the foundation for complex testing and scraping, but for agent coding and quick interface validation, demand for lightweight CLI wrappers is increasingly visible. The main conclusion is simple: for LLM it's more profitable to give not the entire browser, but a compact command and element reference layer. It's cheaper, more stable, and more practical in real-world work.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…