Microsoft Research releases Webwright — browser agent that solves web tasks at 60%
Microsoft Research released Webwright — a terminal agent for browsers. Instead of standard click-trace, it uses Playwright scripts. On the complex Odysseys benc

Microsoft Research introduced Webwright — a framework for browser agents that performs complex web tasks nearly twice as successfully as baseline language models.
How Webwright Works
This is a terminal-based agent that automates browser interaction. The key feature: instead of the conventional click-trace approach (where the system records a sequence of clicks and coordinates), Webwright generates and executes Playwright scripts — a powerful framework for programmatic browser automation.
The framework is built simply: approximately 1000 lines of code, three modules working in a unified agent cycle. Such a minimalist design initially seems naive, but the results proved impressive. Instead of attempting to generate clicks point-to-point, the agent understands DOM structure and writes the necessary scripts.
Benchmark Results
On the Odysseys benchmark (which tests execution of long web tasks in a real browser), Webwright with GPT-5.4 achieved 60.1%. This is twice as high as the baseline 33.5% from the model alone. On the simpler Online-Mind2Web benchmark, the score is even higher — 86.7%. Importantly: this is the best result among all open-source harness recipes.
The twofold improvement was not achieved through special tricks or hardcoded solutions. It is a direct consequence of proper agent cycle design and efficient use of GPT-5.4 capabilities.
- Odysseys benchmark: 60.1% (previously 33.5% for the baseline model)
- Online-Mind2Web: 86.7% (record among open-source)
- Framework size: ~1000 lines of code
- Architecture: three modules in a unified cycle
- Model: GPT-5.4 (standard, no fine-tuning)
Why This Works
Browser agents have long relied on click-trace sequences or required massive language models. Webwright demonstrates a third path: proper architecture and Playwright scripts as an intermediate language deliver significant quality gains. Additionally, Playwright allows the agent to work with the DOM directly, which is more reliable than relying on computer vision. When a website changes, the script can adapt because it sees the page structure, not just pixels.
What This Means for the Market
Browser agents are maturing. Microsoft Research has shown its approach, and OpenAI (Operator), Anthropic (Computer Use), and others are working in parallel. The web-automation market is only beginning to form: form filling, price comparison, service ordering, subscription management. Webwright proves that achieving good results does not require waiting for super-models — proper architecture and simple modules can deliver multifold quality improvements.