LLM models are stuck in formulaic thinking — a startup aims to fix it
Ask any chatbot to name a random number from 1 to 10 — you will almost certainly get 7. It is a symptom of a systemic problem: all major LLMs were trained on…
AI-processed from MIT Technology Review; edited by Hamidun News
Claude, ChatGPT, and Gemini demonstrate equally predictable responses to similar queries — on July 1, 2026, MIT Technology Review identified this phenomenon as a systemic "groupthink" of language models and reported on a startup working to overcome it.
The Number Test: Why This Is Not a Coincidence?
Ask any popular chatbot to name a random number between 1 and 10 — you'll almost certainly get 7. Ask again — you'll hear 3 or 4, then 8 or 9. The pattern reproduces with striking consistency across different models from different companies.
The explanation is straightforward: all major LLMs were trained on similar web corpora, where "7" as an answer to this question appears more frequently than other numbers — people themselves call seven "the most random" number. Reinforcement learning from human feedback (RLHF) further encourages "safe" and expected answers: those that more often receive high ratings from human raters. Models are literally trained to give a predictable response.
- Seven as a "random" number is a textbook example of LLM template thinking
- The pattern is characteristic of Claude (Anthropic), ChatGPT (OpenAI), and Gemini (Google DeepMind)
- The reason is overlapping training data and similar RLHF procedures across all major labs
Where Groupthink Causes Real Harm
Numbers are merely a visible symptom. In real tasks, the problem is larger: models reproduce the same cultural clichés, formulate strategic recommendations in similar ways, offer comparable marketing solutions. When a company uses multiple LLMs to "diversify perspectives," it often gets paraphrased versions of the same opinion — with the illusion of independence.
"We've created an ecosystem where all models see the world the same way — because they read the same thing," — MIT
Technology Review states.
The problem is especially acute where originality matters: scientific hypothesis generation, unconventional content, assessment of non-trivial risks. "Independent verification" through multiple LLMs in such cases creates an illusion of diversity — but not actual diversity.
What the Startup Proposes
MIT Technology Review describes a startup focused on methods to overcome "template thinking" in language models. The exact architecture of the solution is not disclosed. The industry, meanwhile, is discussing several approaches to this challenge:
- Training on more diverse data with deliberate inclusion of niche perspectives
- Managed stochasticity at the fine-tuning stage — encouraging variability as an explicit goal
- Ensemble systems where multiple models with different "biases" debate each other
- Diversity metrics for answers as a mandatory part of evals — alongside accuracy and safety
What This Means
If methods to overcome "group consensus" manage to set a new industry standard, it will change how we evaluate AI systems: diversity and independence of responses will become measurable requirements equal to accuracy or safety. For corporate users, this opens the possibility of getting genuinely different perspectives from AI, rather than a statistically averaged viewpoint in different formulations.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.