Agents

Tool Use

Tool use is the capability of an AI language model to invoke external functions, APIs, or services—such as web search, code execution, or database queries—during inference, allowing it to retrieve information and take actions beyond text generation.

Tool use is the ability of a language model to identify, during text generation, that a specific external capability is needed and to produce a structured call that invokes that capability. The model receives the tool's output and incorporates it into its continued reasoning and response. Tools extend what a language model can do from purely parametric recall to active information retrieval and real-world action.

Technically, tools are defined by a schema—typically a JSON or function-signature description—that specifies the tool's name, purpose, and parameters. At inference time, when the model determines that invoking a tool is appropriate, it outputs a structured call rather than natural language. A surrounding system intercepts that call, executes the corresponding function, and returns the result to the model as a new input. The model then continues generation with that result in context. Multiple tool calls can be chained or made in parallel depending on the framework.

Tool use matters because it overcomes fundamental limitations of static language models: they cannot access information published after their training cutoff, cannot perform precise computation reliably, and cannot directly modify external systems. With tools, a model can search the web for current information, execute code to calculate exact values, read from or write to databases, send messages, or call third-party APIs—all within a single conversational turn or agentic workflow.

As of 2026, tool use is a standard capability across all major frontier model providers, including Anthropic (Claude), OpenAI (GPT series), Google (Gemini), and Meta (Llama). Production agents commonly rely on dozens of tools. The canonical tool set for a research agent might include web search, PDF parsing, Python execution, and vector database lookup, all called adaptively as needed during a single task.

Example

A Claude agent answering a question about current stock prices invokes a web search tool to retrieve live data, then calls a code execution tool to compute percentage changes, and returns a natural language answer grounded in those results.

Related terms

← Glossary