AI agent security in production: a practical guide to Red Teaming
An agent with access to email and documents is a risky system. A mistake can lead to data leaks or financial losses. Doubletapp published a guide to Red Teaming

An agent is not a chatbot. It's a system with access to tools, services, and corporate data. A model error in an isolated chat is awkward. An agent error with access to email and documents is a potential data breach, reputational or financial incident.
What Makes Agent Red Teaming Different
Red Teaming of LLMs focuses on the language model itself: we test for prompt injection, jailbreak, hallucinations. When the model answers incorrectly, it's a local problem. Red Teaming an agent is a completely different matter. Here we examine the entire stack: the model, the tools, external APIs, integrations with corporate systems, request routing logic. An agent may answer questions correctly but make a mistake choosing a tool, pass parameters incorrectly, or forget to check access rights. And suddenly the agent performs an action it shouldn't have. One error in this chain is an incident. Doubletapp developed a Red Teaming methodology that covers both levels: vulnerabilities in the model itself plus vulnerabilities in its integration with the outside world.
Promptfoo: From Theory to Practice
Promptfoo is a framework for automating Red Teaming. You define test scenarios (attack scenarios), a set of dangerous prompts, and rules for checking results. The tool runs these tests against your agent and generates a report of which attacks succeeded. The basic workflow is straightforward: describe the behavior you want to protect; write test scenarios—attempts to make the agent violate that behavior; run Promptfoo—the tool automatically runs all tests; review the report and identify the gaps; fix the vulnerability, repeat. The tool supports integration with OpenAI, Anthropic, Claude, and other models. All logs are transparent, detailed, and easy to analyze.
What Vulnerabilities to Look For
In practice, Doubletapp encountered recurring classes of problems:
- Improper tool authorization—the agent chooses the right tool but doesn't check if the user has rights for this operation
- Parameter confusion—the agent passes user_id instead of admin_id due to unclear naming in the API specification
- Chain attacks—one small error plus another small error together result in a complete system bypass
- Social engineering through the model—an attacker makes the agent believe it's authorized when it actually isn't
- Context leakage through logs—the agent logs sensitive data that another user later sees
"This is the first step in the process, not the final product,"—roughly how people talk about any Red Teaming.
The first round of testing will expose gaps that then need to be closed wave by wave.
What This Means
Red Teaming is emerging from laboratories into operational reality. If you've already deployed an agent to production, you need a system that continuously searches for vulnerabilities. Promptfoo is one of the tools you can set up right now and use on your stack. Business now demands not just functionality, but proof of security. And this is the right demand—because the stakes are high.