OpenAI revealed the source of the «gremlins» in ChatGPT and showed how to remove the restriction in Codex

Q: What is the source?

Originally published on 3DNews AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Apr 30, 2026. Reading time: 3 min.

OpenAI acknowledged a strange bug in GPT behavior: starting with GPT-5.1, models increasingly inserted goblins, gremlins, and other creatures into responses…

Hamidun News Editorial

AI monitoring · 3DNews AI

Apr 30, 2026· 3 min

AI-processed from 3DNews AI; edited by Hamidun News

OpenAI revealed the source of the «gremlins» in ChatGPT and showed how to remove the restriction in Codex — Source: 3DNews AI. Collage: Hamidun News.

◐ Listen to article

OpenAI has publicly explained a peculiar characteristic of its models: they began inserting goblins, gremlins, and other creatures far too frequently into responses. The company linked this not to internet memes, but to a specific training signal within the Nerdy personality mode and even demonstrated how to temporarily remove the protective filter in Codex.

Where did the creatures come from

On April 29, 2026, OpenAI released a detailed breakdown of why GPT models began overusing such words in metaphors. According to the company's internal observations, a notable shift began after the launch of GPT-5.1: users complained about an overly familiar tone, and researchers noted recurring verbal tics.

When the team checked the statistics, it turned out that the frequency of the word "goblin" in ChatGPT increased by 175%, and "gremlin" by 52%. With GPT-5.4, the problem became even more pronounced, and analysis revealed an important detail: the spike was strongly associated with the Nerdy personality mode. This style was used in only about 2.5% of ChatGPT responses, but accounted for 66.7% of all "goblin" mentions. For OpenAI, this became an argument against the simple version of internet slang influence. If it were merely a matter of general network culture, such words would be distributed much more evenly across traffic.

Why the habit became entrenched

OpenAI explains that the root of the problem was in the reward system during reinforcement learning training. The model, tuned for a playful and deliberately "nerdy" style, received extra points for responses featuring such imagery. An internal audit showed that a special reward signal for Nerdy rated variants with "goblin" or "gremlin" higher than neutral formulations in 76.2% of the datasets checked.

Next came an unpleasant side effect: a verbal tic successful in one mode began to spill over into other scenarios. OpenAI separately describes this as a feedback loop: first the desired style is reinforced, then characteristic words become entrenched with it, then they increasingly appear in new rollout responses and return to training data between fine-tuning stages. As a result, the model transfers a specific technique into broader behavior, although it was originally meant to remain within a single personality setting.

In simplified form, the chain looked like this:

Playful style received rewards
Along with it, characteristic words also received rewards
Such formulations appeared more frequently in new rollout responses
These responses were reused in SFT and preference data
The model increasingly reproduced the same pattern outside of Nerdy

A search through SFT data for GPT-5.5 showed that the issue involved more than just two words. Other creatures appeared in the training examples: raccoons, trolls, ogres, and pigeons. This is why the filter in Codex looked so strange and detailed: it blocked not a single meme, but an entire family of accidentally entrenched verbal habits that the model had made part of its normal style even in work-related responses.

How OpenAI is fixing it

After the launch of GPT-5.4, the company removed the Nerdy personality mode in March 2026, then removed the reward signal that was pushing the model toward such metaphors. In parallel, OpenAI began filtering training data containing these words to reduce the chance of their inappropriate appearance. However, GPT-5.5 was already being trained before the team traced the root cause, so traces of the problem made their way into Codex—a tool for programming based on the new model.

"At first it seemed funny, but the number of employee complaints

became alarming."

During early tests of Codex, employees saw the same manner of speech again, and OpenAI added a direct instruction to the developer prompt not to mention such creatures without explicit necessity. But the story didn't end there: in its breakdown, the company also published the command that runs Codex without this suppressive instruction. In other words, OpenAI not only publicly acknowledged the strange bug but actually allowed enthusiasts to bring it back for experiments.

What this means

The story of the "gremlins" is a good example of how a small reward signal can damage the behavior of a large model more severely than benchmarks suggest. For developers, it's a reminder: personalization and stylistic modes need to be checked not just for usefulness, but also for what verbal habits they inadvertently spread throughout the entire system.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation