LangChain Loses Reasoning Content in CoT Models: How to Fix the LLM Provider Bug
Developers discovered a critical bug in LangChain: ChatOpenAI, ChatDeepSeek, and other chat classes don't transmit the reasoning content block when…
AI-processed from Habr AI; edited by Hamidun News
Developers working with CoT-models through LangChain have encountered an unpleasant surprise: the framework's chat classes—ChatOpenAI, ChatDeepSeek, and similar ones—do not preserve the reasoning content block in the final response. This means that users simply wait while the model "thinks," receiving no feedback whatsoever, while the reasoning itself disappears without a trace. The problem affects integration with most popular LLM providers and aggregators.
When a model with reasoning capabilities—for example, from the DeepSeek-R1 family or step-3.5-flash from Stepfun—generates a response, the internal reasoning process is captured in a separate reasoning_content block. This block is exactly what gets lost: neither ChatOpenAI nor other LangChain chat classes pass it further down the processing chain.
Why does this matter? CoT-models (Chain of Thought—chain of reasoning) are specifically trained to form explicit thinking steps before the final answer. Developers choose them precisely for this transparency: the ability to show the user how the model arrived at the solution, or to use intermediate steps for further processing in a pipeline. When the reasoning block is lost—the value of the CoT approach is diminished.
The absence of streaming reasoning content directly impacts UX. The user sees a blank screen while the model conducts a chain of reasoning spanning hundreds of tokens. The subjectively perceived response time increases sharply, although the model is actually already working. For products where response speed is critical, this is a noticeable drawback.
The author discovered the problem in practice while working with the stepfun/step-3.5-flash model through the Russian provider polza.ai. The provider transmits reasoning content in the stream, however LangChain does not catch it and does not pass it further. None of the tested aggregators solved the problem on their side.
The solution turned out to lie in extending LangChain's standard chat classes. The essence of the approach: redefine the method for processing streaming chunks so that it explicitly extracts the reasoning_content field from the provider's response and adds it to the output structure of AIMessage. Thus, the reasoning block becomes available both in streaming mode and in normal model calls.
In practice, this means creating a custom chat class that inherits from ChatOpenAI, with redefinition of the _stream method and the logic for assembling the final message. During streaming, reasoning_content begins to display immediately, in parallel with how the model generates reasoning—which fundamentally improves interface responsiveness.
The case is important not only as a technical solution, but also as a symptom of a broader problem: general-purpose frameworks like LangChain adapt slowly to the specifics of new model classes. API standards for transmitting reasoning content vary among different providers, there is no unified approach—and until one appears, developers will have to independently close the gaps through customization.
For teams building products on CoT-models and LangChain, the described approach provides a ready-made extension template. It is reproducible for any provider that returns reasoning_content in a format compatible with the OpenAI API.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.