LangChain optimized Deep Agents for different models: +10–20% performance gain

LangChain added model-specific profiles to Deep Agents for OpenAI, Anthropic, and Google. The system automatically adjusts prompts, tools, and middleware based on the selected model, taking into account its characteristics and strengths. On the tau2-bench benchmark, this yielded a 10–20 point performance gain. Developers can now work with different models without manual reconfiguration.

Hamidun News Editorial

AI monitoring · LangChain Blog

May 21, 2026· 3 min

AI-processed from LangChain Blog; edited by Hamidun News

LangChain optimized Deep Agents for different models: +10–20% performance gain — Source: LangChain Blog. Collage: Hamidun News.

◐ Listen to article

LangChain released an update for Deep Agents — a framework for building complex multi-step AI agents. Now the system can automatically adapt to different language models: OpenAI, Anthropic, and Google. This means that the same agent can work better simply by choosing a different model. It's like changing the transmission in a car — the way you control the vehicle stays the same, but efficiency changes dramatically.

Why model adaptation is needed

Deep Agents were originally developed as a universal system that would work equally well with all models. The logic was clear: write one prompt, configure tools once, and they would work for everyone. In practice, it turned out to be false. Different models have different strengths, psychology, and limitations. OpenAI models are better at long chains of reasoning but require specific instructions. Claude from Anthropic likes to think out loud and handles very large contexts well. Google Gemini can pull tools in parallel and works faster. Trying to fit everyone into one mold is like writing code that needs to work simultaneously on Python 3.8, Java, and Rust. Something always suffers.

Before this update, Deep Agents worked, but with some lag — the system couldn't fully leverage all the strengths of a specific model. Developers had to manually select prompts, tools, and parameters for each model separately. This was more of a hobby than an enterprise solution requiring serious debugging before production.

Model-specific profiles — how it works

The new update adds a profiles mechanism that automatically adjusts three key system components for a specific model:

Prompts — the system reformulates instructions depending on the target model. One instruction style is used for OpenAI, another for Claude, and a third for Google. It's like writing an essay for different professors — each wants to see their own style. The system knows these preferences and adapts the text.

Tools — the set of tools and their descriptions are optimized for the model's working style. For example, OpenAI models handle JSON format for structured output better. Claude prefers text descriptions with examples. Google Gemini can select multiple tools simultaneously. The profiles take this into account when forming the tool set.

Middleware — the agent's step-processing logic adjusts to the model's reliability and speed. If the model is slower but more accurate (like Claude), middleware can increase the timeout and better handle errors. If the model is fast (like Gemini Flash), the logic can be more aggressive in retries and not wait too long.

LangChain released ready-made profiles for OpenAI GPT-4 and GPT-4o, Claude 3 (Anthropic), and Google Gemini. The developer simply selects which model to use in the configuration — and the system automatically reconfigures the prompts, tools, and middleware. Manual work nearly disappears.

Results: +10–20% on the benchmark

LangChain tested the new profiles on an independent benchmark tau2-bench — a set of complex tasks for agents (language-to-language translation, multi-step arithmetic, logical chains). Result: the profiles gave +10–20 points of improvement compared to the basic universal configuration. On some subset of tasks, the difference was even larger — up to 25 points. This is not a mega-number, but for production systems it's noticeable. +15% accuracy means 15% fewer errors, 15% less rework, fewer user complaints to support, and fewer incidents. For large systems with millions of calls, this means millions of rubles in operational cost savings and less stress on engineers.

What it means for developers

Deep Agents become simpler and more reliable for developers and companies. Before, if you wanted to use different models in one system, you had to be an expert — manually selecting prompts, parameters, reformulating instructions, changing retry logic. Now the system does it for you.

You simply choose the model during initialization — and you're done. It's like a car with adaptive suspension that automatically adjusts stiffness depending on the road surface and speed. The driver doesn't need to remember which suspension to choose for dirt vs.

asphalt — the car handles it itself. Same here: choose the model — and the system automatically knows how to use it best. For companies building multi-step AI systems (order processing, contract analysis, code generation, documentation), this saves weeks of production debugging and reduces the risk of regression when switching models.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →