OpenAI revealed the source of the «gremlins» in ChatGPT and showed how to remove the restriction in Codex
OpenAI acknowledged a strange bug in GPT behavior: starting with GPT-5.1, models increasingly inserted goblins, gremlins, and other creatures into responses…
AI-processed from 3DNews AI; edited by Hamidun News
OpenAI has publicly explained a peculiar characteristic of its models: they began inserting goblins, gremlins, and other creatures far too frequently into responses. The company linked this not to internet memes, but to a specific training signal within the Nerdy personality mode and even demonstrated how to temporarily remove the protective filter in Codex.
Where did the creatures come from
On April 29, 2026, OpenAI released a detailed breakdown of why GPT models began overusing such words in metaphors. According to the company's internal observations, a notable shift began after the launch of GPT-5.1: users complained about an overly familiar tone, and researchers noted recurring verbal tics.
When the team checked the statistics, it turned out that the frequency of the word "goblin" in ChatGPT increased by 175%, and "gremlin" by 52%. With GPT-5.4, the problem became even more pronounced, and analysis revealed an important detail: the spike was strongly associated with the Nerdy personality mode. This style was used in only about 2.5% of ChatGPT responses, but accounted for 66.7% of all "goblin" mentions. For OpenAI, this became an argument against the simple version of internet slang influence. If it were merely a matter of general network culture, such words would be distributed much more evenly across traffic.
Why the habit became entrenched
OpenAI explains that the root of the problem was in the reward system during reinforcement learning training. The model, tuned for a playful and deliberately "nerdy" style, received extra points for responses featuring such imagery. An internal audit showed that a special reward signal for Nerdy rated variants with "goblin" or "gremlin" higher than neutral formulations in 76.2% of the datasets checked.
Next came an unpleasant side effect: a verbal tic successful in one mode began to spill over into other scenarios. OpenAI separately describes this as a feedback loop: first the desired style is reinforced, then characteristic words become entrenched with it, then they increasingly appear in new rollout responses and return to training data between fine-tuning stages. As a result, the model transfers a specific technique into broader behavior, although it was originally meant to remain within a single personality setting.
In simplified form, the chain looked like this:
- Playful style received rewards
- Along with it, characteristic words also received rewards
- Such formulations appeared more frequently in new rollout responses
- These responses were reused in SFT and preference data
- The model increasingly reproduced the same pattern outside of Nerdy
A search through SFT data for GPT-5.5 showed that the issue involved more than just two words. Other creatures appeared in the training examples: raccoons, trolls, ogres, and pigeons. This is why the filter in Codex looked so strange and detailed: it blocked not a single meme, but an entire family of accidentally entrenched verbal habits that the model had made part of its normal style even in work-related responses.
How OpenAI is fixing it
After the launch of GPT-5.4, the company removed the Nerdy personality mode in March 2026, then removed the reward signal that was pushing the model toward such metaphors. In parallel, OpenAI began filtering training data containing these words to reduce the chance of their inappropriate appearance. However, GPT-5.5 was already being trained before the team traced the root cause, so traces of the problem made their way into Codex—a tool for programming based on the new model.
"At first it seemed funny, but the number of employee complaints
became alarming."
During early tests of Codex, employees saw the same manner of speech again, and OpenAI added a direct instruction to the developer prompt not to mention such creatures without explicit necessity. But the story didn't end there: in its breakdown, the company also published the command that runs Codex without this suppressive instruction. In other words, OpenAI not only publicly acknowledged the strange bug but actually allowed enthusiasts to bring it back for experiments.
What this means
The story of the "gremlins" is a good example of how a small reward signal can damage the behavior of a large model more severely than benchmarks suggest. For developers, it's a reminder: personalization and stylistic modes need to be checked not just for usefulness, but also for what verbal habits they inadvertently spread throughout the entire system.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.