Habr AI→ original

Goblins in GPT-5.1: how a fantasy habit took over OpenAI's model

Researchers spotted a strange trend in GPT-5.1: the model constantly uses metaphors about goblins, gremlins, and other fairy-tale creatures. This is not a class

Goblins in GPT-5.1: how a fantasy habit took over OpenAI's model
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

In GPT-5.1, a strange and completely unexpected epidemic has begun: the model started obsessively mentioning goblins, gremlins, and other fantasy creatures in almost every third or fourth response. Not as noticeably as a mathematical crash or clinical hallucination, but clearly and systematically. OpenAI researchers noticed that the trend is growing with each new generation of the model, and now suspect that this could be a signal of deeper problems in the training process and the propagation of errors between generations.

An Anomaly Without Metric Decline

Usually when something goes wrong in large language models, we see it immediately and clearly: the quality metric drops, answers become complete nonsense, users start complaining. But with goblins it's completely different. One mention of a "little goblin" in a response seems cute and harmless — maybe even funny and witty.

The problem lies in scale: across training generations, such mentions became more and more frequent. At first glance this looks like a minor oddity, hardly worth worrying about. But researchers see in this a symptom of a more serious phenomenon.

The model somewhere in its transformer layers "learned" to prefer fantasy metaphors when describing complex computational processes. Little goblins started attacking logic, gremlins got tangled in syntax, and all of this looked very figurative, but completely inappropriate.

How the Strange Habit Multiplied Across Generations

The first generation of GPT-5.1 produced a goblin approximately once per hundred responses. Nothing scary, nothing that would require intervention. But developers didn't clean the training data of this phenomenon, and when they started training the second generation on the outputs of the first, the mention frequency increased almost twofold — to approximately once per fifty responses. The third generation already recalled fantasy creatures with alarming regularity: twice per twenty responses. The fourth generation showed mentions even more frequently. Researchers encountered a classic learning-on-outputs problem: if there's an undesirable pattern in the data, and you train a new model on the outputs of an old model, the pattern can amplify exponentially.

"Goblins reproduced like a virus, but a benevolent virus — it harmed

no one, it just greeted everyone with a smile," one of the researchers noted.

The problem became acute enough to attract serious attention. The model started producing recommendations like "a little goblin will show you the right path in your database" or "gremlins will help you optimize your algorithm."

Hypotheses About the Origin

Where did these hordes of fantasy creatures come from? Researchers have several competing hypotheses. The first: the training data simply has an excess of fantasy literature, D&D content, and role-playing games, where developers once used goblins as metaphors for describing complex systems. The second version points to RLHF (Reinforcement Learning from Human Feedback). Perhaps human annotators accidentally marked a response with creative goblin usage as "good" and "creative," and this strangely set a marker in the model. The third, most interesting hypothesis: the model itself "noticed" the effectiveness of metaphors and chose goblins because they're universal. They're familiar enough thanks to video games and pop culture, but abstract enough to fit any context — from databases to machine learning.

  • Excess of fantasy content in training data
  • Positive RLHF reinforcement for creative explanations
  • The model's independent discovery of metaphor effectiveness
  • Absence of filters in intermediate training generations
  • Exponential amplification of the pattern when training on outputs

What This Means

The story about goblins in GPT-5.1 is not just an amusing bug or curious case. It shows how large language models can develop strange yet persistent habits that are completely invisible in standard quality metrics. Users might not even notice the slow filling of their responses with invisible fantasy creatures. It reminds us of the critical importance of not just quantitative evaluation of models — accuracy, BLEU, human ratings — but also qualitative analysis of trends in model outputs across training generations. Goblins today, who knows what tomorrow.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…