Models

Reasoning Model

A reasoning model is an AI system designed to solve complex, multi-step problems by generating explicit intermediate reasoning steps — often called a chain of thought — before producing a final answer.

A reasoning model is a language model specifically optimized to engage in deliberate, step-by-step problem decomposition rather than mapping an input directly to an output in a single forward pass. Before committing to an answer, the model generates an internal or visible sequence of reasoning steps — checking sub-problems, identifying errors in earlier steps, and integrating intermediate conclusions. This additional test-time computation allows the model to trade inference latency for accuracy on tasks that standard autoregressive generation handles poorly.

The dominant technique for producing reasoning models is reinforcement learning from verifiable rewards (RLVR). Models are trained on domains where correctness can be checked automatically — mathematics problems with numeric answers, formal logic, and code with runnable test suites. Correct final answers yield positive reward; incorrect ones yield negative reward. No human labels on intermediate reasoning steps are required. OpenAI's o1 (released September 2024) demonstrated this approach at scale; subsequent systems including OpenAI o3, DeepSeek-R1 (January 2025, open weights), Anthropic's extended thinking mode in Claude 3.7, and Google Gemini 2.5 Pro followed the same paradigm. The intermediate reasoning tokens — sometimes thousands of words of internal computation — are often hidden from end-users or surfaced in a collapsible thinking block.

Reasoning models substantially improve performance on tasks requiring multi-step logical inference, where errors in early steps cascade into wrong final answers. On the American Invitational Mathematics Examination (AIME), reasoning models reached high-scoring performance in 2024–2025 that places them among top human competitors. On GPQA Diamond, a benchmark of PhD-level science questions, OpenAI's o3 exceeded average expert human scores. In software engineering, reasoning models resolve a substantially higher fraction of real-world repository bugs on the SWE-bench benchmark than non-reasoning counterparts.

The primary trade-off is inference cost and latency: a reasoning model may spend seconds to minutes generating a chain of thought before answering, and the additional tokens consumed can increase API costs significantly. This has driven development of efficiency variants — o3-mini, DeepSeek-R1-Distill series, Gemini 2.5 Flash — that retain most reasoning capability at lower compute. Selecting between a standard and a reasoning model typically depends on whether task complexity justifies the latency and cost increase.

Example

A software team submits a complex algorithmic bug report to a reasoning model; after several seconds of internal chain-of-thought, the model identifies that the root cause is an off-by-one error specific to empty-input boundary conditions, generates a targeted fix, and produces a regression test — whereas the same query to a standard model returned a plausible-looking but incorrect patch.

Related terms

Latest news on this topic

← Glossary