MIT researchers teach AI to honestly say "I'm not sure" and hallucinate less
MIT demonstrated a way to reduce one of the main problems of reasoning models—confident errors. The new RLCR method teaches AI not only to provide answers…
AI-processed from MIT News; edited by Hamidun News
MIT researchers have proposed a way to make language models noticeably more honest in their responses: not just solve the task, but simultaneously assess how confident they are in their own conclusion. This sounds like a minor tweak, but in practice it targets one of the most frustrating problems of modern reasoning models — the habit of speaking in a confident tone even when the answer was essentially guessed. The new methodology does not reduce the quality of the answers themselves; rather, it helps the model better distinguish cases where it actually knows something from situations where it should acknowledge uncertainty.
The MIT CSAIL team frames the problem quite directly: today's powerful models often behave like the loudest person in the room. They answer with equal confidence whether the logic worked or whether the model simply guessed. According to the researchers, the reason lies in the very scheme of reinforcement learning currently used to develop reasoning capabilities.
In the typical variant, a model receives a reward for a correct answer and a penalty for an incorrect one. There are almost no intermediate states. If a model randomly arrives at a correct result, it is rewarded the same as if it had carefully derived the solution.
Over time, this pushes the system to always answer, leaving no room for the phrase "I'm not sure." This is precisely what MIT attempted to fix in the RLCR approach — Reinforcement Learning with Calibration Rewards. Instead of a purely binary evaluation, researchers added another component to the reward function: the Brier score, a metric that compares declared confidence with actual accuracy.
In practice, after a chain of reasoning, the model outputs not only an answer but also a numerical assessment of its own confidence. If it is overconfident and wrong, this is penalized. If, conversely, it gives a correct answer but underestimates confidence without reason, this is also accounted for.
The authors claim that such a scheme formally leads to two goals at once: high accuracy and good calibration — that is, correspondence between what the model says about its confidence and how often it is actually right. Experiments were conducted on a model with 7 billion parameters. According to MIT, RLCR reduced calibration error by up to 90 percent compared to standard reinforcement learning, while accuracy did not decline and even increased in some tests.
The effect persisted not only on tasks the model was trained on, but also on new datasets, including six datasets it had not seen before. Researchers separately compared the method with post-hoc approaches, where confidence is assessed after training through an external classifier. RLCR proved superior here as well: instead of a cosmetic addon to a finished model, it changes the system's behavior during training itself.
Moreover, the MIT team shows that standard RL not only fails to improve calibration but often makes it worse: the model becomes more capable but simultaneously more overconfident. There is practical value to this as well. If a model generates multiple answer options, you can choose the one where it reports the highest confidence, or weight the candidates' votes with this assessment in mind.
According to the authors, this improves both accuracy and calibration as computational resources for inference increase. Another interesting result: when researchers trained separate classifiers on the model's outputs, explicit reasoning about its own uncertainty provided an additional useful signal, especially for more compact models. In other words, the model's attempt to articulate what it knows and what it doesn't know turns out to be not a decorative element but a substantive part of the prediction.
What does this mean in practice? If the RLCR approach scales to larger commercial models, the industry gains a chance to reduce not only the number of explicit errors but also the number of dangerous errors masked by a confident tone. For fields like medicine, law, finance, and corporate analytics, this is especially important: users need not just to get an answer but to understand how much they can trust it.
MIT's work offers not another filter on top of an already trained model, but a more fundamental idea: teach AI not only to find solutions but to honestly measure the limits of its own knowledge. It is precisely this habit that could prove one of the most useful updates for the next generation of reasoning systems.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.