Why bots based on protected LLMs are frequently hacked: analysis of 14,000 GPTs
Base LLMs are protected from attacks. But bots built on them are vulnerable. The culprit is the orchestration layer: system prompts, RAG, tools, webhooks. Analy

A secure base LLM model is not a guarantee of a secure bot. A paradox? No, just architecture. When you take a protected model like GPT or Claude and wrap it in a system prompt, add RAG, tools, and APIs — a new attack surface appears. This is called the orchestration layer, and it's exactly where bots get compromised, even when they're protected at the model level.
How the base is protected
Base LLMs undergo serious safety training: their creators train them to refuse dangerous requests. Teams at OpenAI, Anthropic, and others spend months ensuring the model understands which requests are unsafe. On top of this comes RLHF (reinforcement learning from human feedback) — the model is aligned with human preferences about what is ethical and what isn't. The result: if you directly ask GPT to hack a website or disclose personal data, it will refuse.
Where problems begin
But as soon as you wrap the model in a bot (whether it's a Telegram bot, web application, or AI agent), you add an entire layer of components, each potentially unsafe:
- System prompts — instructions to the bot that sometimes override model training and can be injected
- Dialog memory — history of requests that grows and can be used for contextual attacks
- RAG (Retrieval-Augmented Generation) — external databases and documents that can be poisoned with false data
- Tools and function calls — direct access to APIs, email, databases, payment systems
- Webhook logic and external services — untrusted data sources that can be compromised
Each layer adds a new attack vector. System prompts can be injected through user input. Dialog memory can be cluttered with prompt injection patterns. RAG can return poisoned data from a compromised source. Tools can be used to bypass model restrictions.
What the analysis showed
Researchers from arxiv analyzed 14,904 custom GPTs — public agents created by users on OpenAI's platform. Result: the overwhelming majority are vulnerable to basic orchestration layer attacks. An attacker doesn't need to compromise the model itself — it's enough to inject the system prompt or poison the RAG source.
Most vulnerabilities are not in the model itself, but in how it's wrapped.
This means you can use the most secure LLM on the planet, but improper architecture will negate its benefits. Bots weaken as they scale because each new component adds complexity and new entry points.
What this means
AI bot security is not just about model selection — it's a comprehensive architectural challenge. You need to protect system prompts from injections, validate input data, control RAG sources, restrict tool permissions, and log all actions. Otherwise, a beautiful LLM becomes a beautiful security hole.