H Company introduces Holo3 — an AI agent for computer use with a record score on OSWorld-Verified

Q: What is the source?

Originally published on Hugging Face Blog. Hamidun News processes and adapts the material with AI.

Q: When was it published?

May 2, 2026. Reading time: 3 min.

H Company has introduced Holo3, a model for computer use that scored 78.85% on OSWorld-Verified. The company is not relying on the benchmark alone: the…

Hamidun News Editorial

AI monitoring · Hugging Face Blog

May 2, 2026· 3 min

AI-processed from Hugging Face Blog; edited by Hamidun News

H Company introduces Holo3 — an AI agent for computer use with a record score on OSWorld-Verified — Source: Hugging Face Blog. Collage: Hamidun News.

◐ Listen to article

H Company presented Holo3 — a new model for computer use that, according to the company, achieved 78.85% on the OSWorld-Verified benchmark and became a leader among computer use systems. The developers position it not as a laboratory prototype, but as the foundation for corporate agents capable of working with real interfaces and multi-step tasks.

Record in OSWorld

The main figure in the announcement is 78.85% on OSWorld-Verified, one of the key benchmarks for evaluating how models handle working on an ordinary computer. H Company emphasizes that Holo3 not only shows a high score, but maintains it with a relatively compact configuration: the model has 10 billion active parameters out of 122 billion total. The company separately compares costs with larger closed systems like GPT 5.4 and Opus 4.6, and promises cheaper inference. Public weights of Holo3-35B-A3B are already available on Hugging Face under the Apache 2.0 license.

How It Was Trained

The foundation of Holo3 is what's called an agentic learning flywheel — a continuous learning cycle that improves two things: interface perception and decision-making. Instead of a single set of screenshots or manual scenarios, the team builds a stream of tasks on which the model learns to understand the screen, choose the next step, and maintain context in long sequences of actions. Special focus was placed on generalization: the system is trained not on one product, but on a class of interfaces it might encounter in its work.

Synthetic Navigation Data — navigation scenarios collected from human and generated instructions.
Out-of-Domain Augmentation — programmatic expansion of scenarios so the agent doesn't break when faced with unexpected interfaces and deviations from templates.
Curated Reinforcement Learning — data filtering and reinforcement learning to maximize accuracy on real tasks.

The idea is to train not on one specific CRM or one website, but on a more general skill of working with interfaces. That's why H Company bets not only on the final benchmark score, but on transferability: if the model understands the logic of screens and can make decisions step by step, it's easier to adapt it to new systems without complete retraining. This is especially important for corporate software, where interfaces are often non-standard and change faster than datasets can be updated.

Synthetic Office

To verify whether this approach works outside the laboratory, the company built a Synthetic Environment Factory — a factory of synthetic corporate environments. Coding agents automatically assemble websites and interfaces from scratch according to specified requirements, after which verifiable tasks of varying complexity are generated for them. Based on this, H Company created a separate set of H Corporate Benchmarks: 486 realistic multi-step tasks in four categories — e-commerce, business software, collaboration tools, and multi-app scenarios. This is already closer not to toy demos, but to how employees actually work inside a company. The most complex tasks require coordination between multiple systems at once.

An example from the article: the agent must extract equipment prices from a PDF, match them against each employee's remaining budget, and then automatically send personalized letters with approval or denial. For such a chain, simply recognizing text on the screen is not enough. You need calculations, document handling, memory of intermediate steps, and the ability not to lose sight of the goal during the process. According to H Company, it's on such scenarios that Holo3 shows an advantage over baseline Qwen3.5 models and leads in single-app tests.

What This Means

The market for AI agents for computer use is increasingly shifting from demonstrations to product scenarios: it's not enough to be able to click on the screen, you also need to handle corporate routine and non-standard interfaces. Holo3 is interesting precisely for this focus. If the stated results are confirmed outside internal tests, business will get another real candidate for the role of office AI agent, not just another model for leaderboards. This is already competition not just on model quality, but on readiness for real office work.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation