Anthropic and other language models can invoke hidden tools without permission
Tool invocation in language models long seemed like a solved problem, but agent systems retain a dangerous flaw: the model can invent a function that doesn't…
AI-processed from Habr AI; edited by Hamidun News
The main risk of agentic LLMs today is not that they sometimes get function parameters wrong, but that with poor architecture they can invoke a tool that was formally never given to them. If such a function exists in the environment, the model is capable of guessing its name, inventing arguments, and achieving a real side effect: reading a secret, writing a message, accessing someone else's API. On paper, access rights look limited, but in practice the boundary between allowed and forbidden becomes probabilistic.
For systems where the model works with data, mail, documents, or corporate integrations, this is no longer a curiosity but a security risk. The problem manifests in the combination of an agentic model and client environment, where the list of permitted tools and the actual namespace diverge. In a demonstration, the model was given only read_url, but in the environment a read_secret function existed.
After a hint like from dialoghelper import *, the model decided it had access to read_secret and tried to invoke that tool. If the library or API checks only the call format and not strictly comparing the function name against the issued schema, the request passes through. The author showed similar behavior not only in Anthropic, but in specific scenarios also in Gemini, Grok, and through multi-provider wrappers.
In other words, the vulnerability arises not at one brand, but at the intersection of model, SDK, and agentic scaffolding. The danger increases sharply when such a system falls into the so-called deadly triad: the model has access to external tools, untrusted content, and private data. Then prompt injection stops being merely an annoying feature and becomes a channel for leakage.
It is enough for an attacker to embed an instruction in an email, webpage, or document, and the model, confident in having a hidden tool, itself reaches for the secret and sends it out. Especially unpleasant is that the architectural separation on which developers rely can give a false sense of security: a sensitive tool seems not to be included in the set, but physically lies nearby in the environment and becomes accessible through an unauthorized call. Detecting such a failure is difficult.
In many cases the model behaves correctly and refuses the forbidden call, then suddenly fails due to a trifle in context: the wording of a hint, message history, or even a module name. The article provides an instructive example where the same action is sometimes blocked, sometimes passes after a harmless mention of dialoghelper. Structured decoding partially reduces the risk because it forces the model to fit into a schema, but there are tradeoffs here too: increased latency, strict limits on the number of strict-tools, and performance degradation on large function sets.
Therefore, relying solely on smart model behavior is impossible. Tool name checking must happen on the provider side and in client code before actual execution. The practical conclusion is simple: if you're building an agent on top of MCP, Jupyter, internal SDKs, or any other dynamic environment, it's not enough to hide extra functions from the description and hope for model discipline.
You need to strictly validate every tool call, separate sensitive operations from untrusted content, and keep the execution space narrower than LLM visibility. Otherwise, a single guessed read_secret or add_msg turns a neat agentic architecture into a system where access rights exist only until the first successful hallucination. The good news is that the fix doesn't look complicated: it's enough to reject any call whose name is absent from the allowed schema before passing it to runtime.
A few lines of defensive code in a library or gateway can close a class of problems that would otherwise be masked as a rare hallucination but in fact breaks the trust model of the entire system.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.