Why bots based on protected LLMs are frequently hacked: analysis of 14,000 GPTs

Base LLMs are protected from attacks. But bots built on them are vulnerable. The culprit is the orchestration layer: system prompts, RAG, tools, webhooks…

Hamidun News Editorial

AI monitoring · Habr AI

May 19, 2026· 2 min

AI-processed from Habr AI; edited by Hamidun News

Why bots based on protected LLMs are frequently hacked: analysis of 14,000 GPTs — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

A secure base LLM model is not a guarantee of a secure bot. A paradox? No, just architecture. When you take a protected model like GPT or Claude and wrap it in a system prompt, add RAG, tools, and APIs — a new attack surface appears. This is called the orchestration layer, and it's exactly where bots get compromised, even when they're protected at the model level.

How the base is protected

Base LLMs undergo serious safety training: their creators train them to refuse dangerous requests. Teams at OpenAI, Anthropic, and others spend months ensuring the model understands which requests are unsafe. On top of this comes RLHF (reinforcement learning from human feedback) — the model is aligned with human preferences about what is ethical and what isn't. The result: if you directly ask GPT to hack a website or disclose personal data, it will refuse.

Where problems begin

But as soon as you wrap the model in a bot (whether it's a Telegram bot, web application, or AI agent), you add an entire layer of components, each potentially unsafe:

System prompts — instructions to the bot that sometimes override model training and can be injected
Dialog memory — history of requests that grows and can be used for contextual attacks
RAG (Retrieval-Augmented Generation) — external databases and documents that can be poisoned with false data
Tools and function calls — direct access to APIs, email, databases, payment systems
Webhook logic and external services — untrusted data sources that can be compromised

Each layer adds a new attack vector. System prompts can be injected through user input. Dialog memory can be cluttered with prompt injection patterns. RAG can return poisoned data from a compromised source. Tools can be used to bypass model restrictions.

What the analysis showed

Researchers from arxiv analyzed 14,904 custom GPTs — public agents created by users on OpenAI's platform. Result: the overwhelming majority are vulnerable to basic orchestration layer attacks. An attacker doesn't need to compromise the model itself — it's enough to inject the system prompt or poison the RAG source.

Most vulnerabilities are not in the model itself, but in how it's wrapped.

This means you can use the most secure LLM on the planet, but improper architecture will negate its benefits. Bots weaken as they scale because each new component adds complexity and new entry points.

What this means

AI bot security is not just about model selection — it's a comprehensive architectural challenge. You need to protect system prompts from injections, validate input data, control RAG sources, restrict tool permissions, and log all actions. Otherwise, a beautiful LLM becomes a beautiful security hole.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →