Anthropic on AI Agents in Cybersecurity: Capabilities and Pitfalls
Anthropic released research on the application of AI agents in cybersecurity. Agents handle routine tasks — finding SQL injections, XSS, building threat…
AI-processed from Habr AI; edited by Hamidun News
Anthropic published research on the application of AI agents to cybersecurity tasks. Edgar Sipki, a Habr author and founder of easyp & sipki tech, took it upon himself to analyze the document and ask an uncomfortable question: how much can we actually trust these agents in practice?
What Anthropic Says
The company tested Claude agents on a wide range of information security tasks — static code analysis, vulnerability detection, threat model building, and infrastructure security assessment. In typical scenarios, agents showed results above the average specialist level: they processed large code bases faster and caught common vulnerability patterns that are easy to miss during manual review under deadline pressure.
Areas where agents already provide practical value:
- Static code analysis — detection of SQL injections, XSS, unsafe dependencies, and hardcoded secrets
- Automated threat model building for new services
- Accelerated penetration testing: the agent maps the attack surface, the specialist focuses on complex vectors
- Generation of detailed risk reports and recommendations for prioritizing fixes
- Monitoring code base changes for security regressions
On paper it sounds convincing. But with a closer look, the picture becomes more complicated.
Where Agents Fall Short
The main problem is quality of work on edge cases. Agents hallucinate vulnerabilities that don't exist and simultaneously miss real problems hidden in non-standard code or specific business logic. In the context of cybersecurity, this is especially critical. A false positive wastes team resources on investigating a non-existent threat. A false negative leaves a real hole open while creating a false sense of security. The second scenario is worse than the absence of a check altogether: it lulls vigilance to sleep.
Another weakness is limited system context. The agent only sees what it is provided. Vulnerabilities tied to the interaction of multiple components, deployment specifics, or a particular cloud environment often go unnoticed — they require understanding the entire architecture, not individual files.
Access Permissions — A Separate Problem
A serious question that rarely surfaces in marketing materials: what permissions does an agent need to work effectively in a security context? Full infrastructure scanning requires elevated privileges. And elevated privileges themselves become an attack vector: if the agent is compromised or makes an error with consequences — the scale of the problem grows rapidly. Anthropic recommends the principle of least privilege and isolated environments for agent-based security tasks. But proper configuration requires additional engineering work — and under tight deadlines, it is often skipped. This creates exactly the hole that the agent is supposed to fill.
"Agents are not a replacement for security specialists, but a tool to accelerate their work" — a key thesis of the
Anthropic research.
What This Means
AI agents already provide value in routine security tasks: standard code review, initial scanning, report preparation. But entrusting them with autonomous authority over critical infrastructure is premature for now. The main conclusion, which Edgar Sipki also draws, is that agents change not the composition of the security team, but its toolkit. The human in the loop remains mandatory — especially where the cost of error is high.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.