Habr AI→ original

LangChain in production: Habr AI explained why multi-agent systems are moving to plain Python

An analysis of LangChain in production: the author built a multi-agent system in plain Python and showed where the framework starts to get in the way. The…

AI-processed from Habr AI; edited by Hamidun News
LangChain in production: Habr AI explained why multi-agent systems are moving to plain Python
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

On Habr AI, a detailed breakdown was published on why production multi-agent systems don't always benefit from LangChain. The author describes a stack built on pure Python and shows where universal LLM abstractions stop saving time and start breaking predictability.

Where Abstraction Breaks

The main criticism of LangChain in the article is that the promise of "switching a model with one line" almost never works as simply in a real service. In practice, even models from the same provider behave differently: the expensive version consistently returns JSON, while the cheaper one starts changing keys, forgetting system instructions, and getting confused in few-shot examples. If you add another provider like YandexGPT to this, it breaks not only the response format but also the categories themselves, on which the downstream logic depends.

"Abstraction is harmful when it pretends different things are the same."

This leads to a second thought: fallback between OpenAI and YandexGPT is a separate engineering task, not a checkbox in the config. Each agent needs separate prompts, a unified validation schema, and a test set of real requests. In the described system, the acceptance threshold for a backup provider is 85%, and every call result passes through a Pydantic schema before being sent to the client. On top of this, the author puts providers into Protocol interfaces so that common behavior is unified, while differences in prompts, authorization, and formats remain explicit.

RAG Without Magic

A separate section of the article is devoted to RAG, where "three lines of code" also quickly end. Changing the embedding model without reindexing the knowledge base essentially nullifies the purpose of vector search: queries and documents end up in different spaces, and the system formally continues to work, just finding the wrong pieces of text. The same happens when changing chunk size: old documents are sliced one way, new ones another, and search quality becomes a lottery that users notice before the team does.

Therefore, in production, according to the author, what matters more than a convenient chain is complete control over the retrieval pipeline. When a response goes to a real client, the team needs to see not only the beautiful final text, but the entire path to it: which filter was applied, which chunks made it into the context, what were the scores, and why one document won over another. Without this transparency, debugging becomes guesswork, and problems surface only after user complaints.

  • score and metadata of each chunk
  • candidates that didn't pass selection
  • filtering by product, client, or scenario
  • knowledge base updates without downtime

Control Matters More Than the Agent

The most fragile part, according to the author, is tool calling. The model can choose the wrong tool, pass incorrect parameters, or refuse to call altogether and confidently hallucinate an answer. The article explains this with a simple example: a user asks about their personal schedule, but the agent goes not to the CRM but to the knowledge base and returns a general course description.

Trying to fix such errors with prompts alone often leads to new edge cases, because the model makes decisions probabilistically, not deterministically. Because of this, in his own architecture, the author moves critical routing out of the LLM into a rule-based classifier, and keeps the model only for ambiguous cases. On top of this are explicit contracts for agents, separate clients for CRM and knowledge base, typed responses, retry for temporary failures, and escalation to a human if a valid result isn't obtained.

An additional argument is security: the smaller the framework and its transitive dependencies, the lower the attack surface and the easier it is to understand what exactly protects the system. The resulting system runs on FastAPI, directly uses provider SDKs, ChromaDB and Bitrix24 API, rather than a general orchestration layer. In the open-source version, the project has about 4500 lines of code, 170 tests, and 84% coverage.

This is more manual work than in a scenario with a ready-made framework, but every step can be logged, reproduced, and checked separately. For production, this is the key trade-off: less magic, more code, but more predictable behavior in failures, fallbacks, and non-standard requests.

What It Means

Habr AI's analysis nicely captures the shift in LLM development: frameworks are still convenient for prototypes and demos, but in production, value shifts toward explicit contracts, validation, and observability. The more providers, integrations, and business risks a system has, the harder it is to hide differences behind a single abstraction, and the more important it becomes to manually control each connection point.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…