Визуальный разум: почему ИИ теперь сам решает, как ему думать
Исследователи в рамках подготовки к ICLR 2026 представили концепцию адаптивного переключения режимов мышления. Проблема современных визуальных моделей в том, чт
AI-processed from Jiqizhixin (机器之心); edited by Hamidun News
You've probably noticed how modern neural networks sometimes get stuck on simple problems. They can easily write an essay about Hegel, but sometimes can't understand whether a key is on the left or right of a mug in a photo. The problem is that standard models process all information the same way — through one massive layer of computations.
This is inefficient and often leads to logical errors. A new paper prepared for the ICLR 2026 conference proposes an elegant solution to this problem through adaptive switching of thinking modes. The idea is simple, but the implementation is impressive.
Researchers have developed a mechanism that allows a model to assess the complexity of a visual query before it starts providing an answer. If you ask an AI to simply find a cat in a picture, it uses a lightweight mode. But if the task requires deep understanding of space and relationships between objects, the system switches to a mode called "graph thinking."
This allows the model to build a clear structure of relationships between objects, mimicking how the human brain analyzes complex scenes. For a long time, the industry followed the path of simple scaling: more parameters, more GPUs, more data. However, universal visual reasoning requires not just brute force, but architectural flexibility.
The authors of the work show that forcing the use of complex logical chains where they are not needed only harms accuracy. The model starts looking for hidden meaning where there is none, and ultimately hallucinates. The adaptive approach solves this problem by creating a kind of cognitive transmission for the neural network.
Why is this important for us? First, it's a direct path to creating more efficient models for robotics and autonomous vehicles. A warehouse robot doesn't need to spend all its computational power just to avoid crashing into a wall, but it desperately needs maximum concentration when sorting fragile objects of different shapes.
Second, this approach significantly reduces the cost of operating large models. We are finally moving away from the "one size fits all" concept toward smart resource distribution. Interestingly, this method echoes Daniel Kahneman's psychological theory of "fast" and "slow" thinking.
Scientists are essentially transferring biological principles of survival into source code. If AI learns to understand when it should "think" and when to answer instantly, we will get systems that are much closer to real intelligence than today's statistical text autocomplete. This is an important step toward making visual AI stop being just an advanced camera and become a full-fledged analytical tool.
Key takeaway: The future belongs to flexibility, not the number of parameters. Will OpenAI and Anthropic be able to integrate such mechanisms into their next flagship models to reduce response latency?
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.