Thomson Reuters lays out four rules for AI agents that businesses can trust
AI agents are increasingly entering real workflows, but trust in them depends not on model power but on process. Thomson Reuters recommends measuring quality…
AI-processed from ZDNet AI; edited by Hamidun News
AI-agents are rapidly moving from experiments into workflows, and companies face a key question: how can we make them reliable enough for real-world tasks. Thomson Reuters believes the answer lies not in the magic of models, but in development discipline, testing, and integration with existing tools.
How to Measure Success
According to Joel Hron, CTO of Thomson Reuters Labs, the first step is to define in advance what actually constitutes a good result. For agentic systems, this is harder than for regular software: it's not enough to check that an answer "looks right." You need to formally describe what qualities make a good result, where the agent can fail, what deviations the business will tolerate, and at what point human intervention is needed. The company uses multiple evaluation levels so as not to rely on a single metric or test set:
- public benchmarks for early evaluation of new models
- internal tests with clear quality criteria for answers
- automated checks for rapid development cycles
- final evaluation by domain experts
Automation helps accelerate iterations, but final trust still needs to go through people. Hron emphasizes that before releasing a product, the team wants confirmation from human experts, not just from metrics and automated tests. For markets where a mistake costs money, time, or legal risks, this approach is not over-caution but a mandatory requirement. Otherwise, an agent might show great demo results but fail in the real world, where nuance and professional context matter.
Common Language for Teams
Thomson Reuters' second insight is that an agent cannot be designed separately from the interface and user experience. If a company wants its employees to work with an agent as a digital colleague, they need a common language, intuitive interface, and transparent interaction logic. Users should see not just the result, but the system's reasoning: what steps it takes, where it requests data, when it uses tools, and when it needs human review. Without this transparency, the agent is perceived as a black box, not as a helper.
This leads to practical advice: designers, product teams, and data scientists should not work in separate channels but literally work together. Hron describes this without romance—you simply need to seat designers next to data scientists and make them discuss regularly what's happening inside the agent. The tighter this coupling, the faster an interface emerges that doesn't hide the system's thinking but makes it manageable. For business, this is also protection against false autonomy, when a beautiful interface masks unstable logic.
Tools and Partners
The third lesson is: don't try to build an "all-knowing" agent that can do everything by itself. Thomson Reuters takes a different approach: break down existing products and turn their functions into verified tools the agent can work with. If a company has dozens of mature applications built up over years, they become not a burden but a set of reliable modules for the new agentic architecture. This approach is especially important now, as models are making significant progress in code generation, plan execution, and multi-step reasoning, but still cannot guarantee predictability on their own.
"We're not playing at 90%.
We're playing at 99% and 99.9%," is how Hron describes the bar for AI agent products.
This leads to the fourth piece of advice: learn not just within your company. Thomson Reuters launched the Trust in AI Alliance with Anthropic, AWS, Google Cloud, and OpenAI, and also develops partnerships with Imperial College London. The focus of such initiatives is explainability, transparency, and those very "last nines" of accuracy that separate an impressive prototype from a working product. For companies, this is a signal: an agentic stack cannot be built in isolation if the goal is not just to implement a trendy feature, but to bring the system to a level that can be trusted with real decisions.
What This Means
The main point of the article is straightforward: businesses shouldn't wait for a mythical perfect agent. Reliable systems are built from measurable quality criteria, tight collaboration between product and technical teams, verified internal tools, and external exchange of practices. The winners won't be companies whose agent sounds smartest, but those whose agent's behavior is best tested, most understandable to users, and most deeply embedded in real operational work.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.