Centaur knew the answers, but didn't understand the questions: research refutes AI 'thinking'
The Centaur model caused a stir: it claimed to imitate human thinking across 160 cognitive tasks — from memory to decision-making. New research exposes this…
AI-processed from Science Daily AI; edited by Hamidun News
Centaur Knew the Answers, But Didn't Understand the Questions: Research Refutes AI 'Thinking'
Centaur claimed a breakthrough: an AI supposedly learned to imitate human thinking across 160 cognitive tasks at once — from memory to decision-making. New research refutes this claim: the model wasn't thinking, it was memorizing patterns.
The Old Debate, a New Player
The question of the nature of human thinking is one of the longest-running in psychology. Is thinking a unified system with a common "engine," or a set of specialized modules, each responsible for its own domain: memory, attention, language, spatial navigation, decision-making? This isn't an academic word game.
The answer determines how to design tests for cognitive abilities, how to treat disorders like dementia or dyslexia, and how to design AI systems claiming to have "understanding." Over decades of debate, no single consensus has emerged. Centaur appeared as a possible answer from an unexpected direction.
The idea: train an AI on data about human behavior in cognitive tests, then see — if the model reproduces the same behavior, it has "grasped" the structure of thinking. The authors claimed that Centaur performed 160 different cognitive tasks with accuracy comparable to humans. The scale of the claim was rare: no model before had claimed such cognitive coverage.
Where the Argument Breaks Down
Researchers who independently verified Centaur's work discovered a fundamental problem: the high results are explained not by "understanding," but by memorization. Specific methodological vulnerabilities:
- Data overlap: the training sample overlaps with the test set — the model effectively "saw" the correct answers before evaluation
- No transfer: on unfamiliar tasks with the same structure, the model didn't show consistent results
- Pattern without mechanism: high correspondence with human answers doesn't mean reproducing cognitive processes — only their external form
- Weak verification: the authors didn't conduct tests on specially modified tasks where memorized answers wouldn't work
According to critics, this is a classic case: the model knew what to answer without understanding the question.
Why This Matters
Centaur's claim was not just scientific — it was instrumental. If AI really modeled cognitive processes, it would open a new method for studying the mind: instead of expensive neuropsychological experiments, theories could be tested directly on AI models. This would radically accelerate cognitive science. But the boundary between "memorization" and "thinking" in evaluating AI remains uncrossed. Language models regularly show high scores on benchmarks — and upon detailed examination, it turns out that behind this lies reproduction of patterns from training data, not generalization.
"The model knew what to answer — but didn't know why," — this is roughly how critics formulate the essence of the objection to
Centaur.
The story here isn't new. Every time an AI scores high on "understanding," a detailed review follows — and it turns out that the test was measuring something else. Centaur fits into this pattern.
What This Means
A convincing benchmark and real understanding are different things. Until standards emerge that fundamentally separate pattern memorization from cognitive generalization, claims about "AI that thinks like a human" will remain questionable. For cognitive science, the next step is obvious: develop tests that by definition cannot be passed through memorization.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.