AWS Releases ToolSimulator for Safe Testing of AI Agents in Strands Evals
AWS released ToolSimulator — a framework for testing AI agents that work with external tools. Instead of real API calls, which could leak personal data or…
AI-processed from AWS Machine Learning Blog; edited by Hamidun News
AWS released ToolSimulator — a framework for testing AI agents that work with external tools. Instead of risky calls to real APIs, it uses LLM to dynamically simulate responses — securely, scalably, and without data leaks. Any AI agent that knows how to call APIs, read databases, or manage external services faces the same problem when testing: how do you verify agent behavior without affecting production?
Traditional approaches don't fully solve it. Direct calls to real APIs are dangerous — the agent might accidentally send an email, create a CRM record, or leak personal data to an external service. Static mocks work for simple scenarios, but fail on multi-step dialogs where a tool's response affects the agent's next request.
AWS proposed a third path: ToolSimulator as part of the Strands Evals SDK. The framework uses LLM to generate realistic tool responses — as if the real API were responding to the agent's request. Meanwhile, no data goes anywhere: everything happens inside an isolated test environment.
How it works in practice. The developer describes the tools the agent uses: their schema, possible responses, edge cases. ToolSimulator takes these descriptions and, when the agent calls a tool during testing, generates a plausible response.
The agent doesn't know it's working with a simulator, not a real service. This allows testing multi-step chains: the agent gets a response, makes the next decision, calls the tool again — and so on throughout the scenario. The framework's key capabilities span three directions.
Scale: ToolSimulator lets you run hundreds of test scenarios in parallel — something that would cost enormously with real APIs and strain infrastructure. Edge case coverage: you can simulate API unavailability, slow responses, unexpected data formats, authorization errors — and check how the agent behaves in each situation. Security: no real calls — no risk of personal data leaks or unintended actions in production.
ToolSimulator is available now as part of the Strands Evals SDK — AWS's open-source toolkit for evaluating AI agent quality. Strands Agents is a relatively new agent framework from AWS; Strands Evals appeared as an accompanying component for testing. ToolSimulator extends this toolkit with a solution to one of the most painful tasks — reliable testing of agents with real dependencies.
The problem affects the entire industry. As AI agents transition from demo to production, reliability requirements spike dramatically. An agent managing email, CRM, or financial transactions must behave predictably in any conditions — including when the tools it relies on behave unexpectedly.
Before approaches like ToolSimulator appeared, developers had to choose between incomplete coverage and the risk that real API calls in a test environment pose. For agent developers, ToolSimulator lowers the barrier to writing comprehensive tests where it was previously too complex or dangerous. Teams will be able to detect integration bugs faster, systematically verify edge cases, and release agents with greater confidence in their behavior.
The tool fits a broader trend: as the agent market matures, specialized solutions emerge not only for creating agents, but also for testing and evaluating them — and AWS is betting on capturing this niche.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.