Inference

Chain-of-Thought (CoT)

Chain-of-Thought (CoT) is a prompting technique in which a language model generates explicit intermediate reasoning steps before producing a final answer, substantially improving accuracy on arithmetic, logic, and multi-step problems.

Chain-of-Thought (CoT) prompting is a technique where a large language model (LLM) produces a sequence of intermediate reasoning steps—a written-out chain of logic—prior to stating its final answer, rather than mapping the input directly to an output. The approach was formally described by Wei et al. at Google Brain in a 2022 paper that demonstrated it as an emergent capability of sufficiently large models (roughly 100B+ parameters at the time).

CoT can be elicited in two main ways. Few-shot CoT embeds worked examples with step-by-step solutions in the prompt, showing the model the expected reasoning format. Zero-shot CoT uses minimal instructions—most famously "Let's think step by step"—to trigger similar behavior without examples, a finding from Kojima et al. (2022). More recent systems, including OpenAI's o-series and DeepSeek-R1, internalize CoT through reinforcement learning from outcome-based rewards, so the model produces structured reasoning traces as a trained behavior rather than a prompted one.

CoT matters because it dramatically improves performance on tasks where accuracy depends on correctly sequencing multiple deductions. On the GSM8K grade-school math benchmark, CoT-prompted PaLM 540B reached accuracy comparable to fine-tuned task-specific models without any task-specific training. The technique also improves interpretability: the reasoning trace is visible, making it easier to identify where a model's logic fails.

By 2026, CoT is ubiquitous in frontier AI systems. Research has diversified into tree-of-thought (exploring branching reasoning paths via search), skeleton-of-thought (decomposing problems into parallel subproblems), and process reward models (PRMs) that score each reasoning step rather than only the final answer. Extended internal CoT traces are now a standard component of reasoning-focused models, with thinking-token counts routinely reaching tens of thousands on competition-mathematics problems.

Example

Given the prompt "A warehouse received 3 shipments of 144 units each and shipped out 275 units. How many units remain?", a CoT-enabled model writes: "3 × 144 = 432 units received; 432 − 275 = 157 units remaining" before returning 157 as the answer—an approach that eliminates the single-step arithmetic errors common in direct-answer mode.

Related terms

← Glossary