LangChain Deep Agents cuts LLM costs by 80% through prompt caching

LangChain has added automatic prompt caching to Deep Agents — and it cuts token costs by up to 80%. The framework detects the provider on its own and enables…

Hamidun News Editorial

AI monitoring · LangChain Blog

Jun 29, 2026· 2 min

AI-processed from LangChain Blog; edited by Hamidun News

LangChain Deep Agents cuts LLM costs by 80% through prompt caching — Source: LangChain Blog. Collage: Hamidun News.

◐ Listen to article

LangChain Deep Agents reduces LLM costs by 80% through prompt caching

LangChain added automatic prompt caching to Deep Agents. According to the company, this reduces LLM token expenses by up to 80% without additional configuration or changes to agent code.

What is prompt caching and why agents need it

Prompt caching is a technique where a model provider stores a "frozen" copy of frequently recurring parts of context. This can be a system prompt, conversation history, or a large array of uploaded documents. On the next request to the model, the provider does not process these tokens again — it retrieves them from cache and charges significantly less for them.

For a typical chat application, caching provides a moderate benefit: the system prompt is usually short there. For agents, the picture is fundamentally different. An agent makes dozens of sequential requests to the model during a single task.

Each time it sends the same long instruction, history of its previous actions, loaded tools, and documents. Without caching, all of this is processed and paid for anew at each step — even if 90% of the content hasn't changed. A simple example: a research agent reads 50 pages of technical documentation, then makes 30 steps of reasoning and tool calls.

Each step pulls the full context back into the model. With caching, the first call is charged in full, all subsequent calls are charged only for new tokens.

How Deep Agents enables cache automatically

LangChain implemented caching so that it works without developer involvement. There's no need to dive into each provider's documentation, set special flags, or restructure the agent architecture. The framework itself determines which provider is being used and activates the required mechanism. All major players are supported:

Anthropic (Claude) — cache at the system prompt and tool descriptions level
OpenAI (GPT-4o, o3) — caching of repeating input segments
Google (Gemini) — contextual caching for long documents
Other compatible providers

This means the developer writes code once for LangChain Deep Agents, and the cache works everywhere. When switching providers, no additional configuration is needed.

Real savings: up to 80% on tokens

The figure "up to 80%" is achievable in specific scenarios — long repeating context plus many agent steps. The more requests to the model within a single task and the longer the unchanging part of the prompt, the higher the savings. For teams running agents in production, this means a significant reduction in the API bill. Especially critical for enterprise scenarios:

Analysis of large document corpora
Multi-step research pipelines
Agents with long-term memory and extended tool context
Content generators processing hundreds of requests per day

LangChain emphasizes that prompt caching is one of the simplest optimizations with maximum ROI in agent development. Providers are also interested in expanding this support: fewer computations — cheaper infrastructure for themselves.

What this means

Agent systems become expensive when scaling, and prompt caching is already one of the main ways to control costs. LangChain removes the engineering barrier: developers no longer need to implement cache themselves for each provider. This lowers the barrier to entry for production agent development and makes running agents economically justified even with a limited budget.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation