Bitrix24 listed eight common mistakes in developing MCP servers for LLMs
Bitrix24 published an unusually useful breakdown of MCP servers without theory for theory’s sake. At the center are eight practical pitfalls: weak OAuth…
AI-processed from Habr AI; edited by Hamidun News
A developer from the Bitrix24 AI team has released a practical breakdown of errors that most often break MCP servers for LLM. The main idea is simple: MCP looks like a deterministic wrapper over an API, but this layer is managed by a non-deterministic model, so familiar engineering approaches regularly fail here.
Where everything breaks
The first problem is authorization. In the specification, everything looks neat: there's OAuth, clear fields, and an expected flow. In practice, different MCP clients support this unevenly: somewhere authorization works partially, somewhere with custom extensions, and somewhere it barely works at all. For local stdio servers this is not so critical, but as soon as the server goes online, fragmentation begins. This is why teams often end up with a less elegant, but stable option: pre-issued tokens that the user manually adds to the config.
The second big trap is the desire to simply wrap Swagger into MCP one-to-one. For a developer, this looks logical: each API endpoint becomes a separate tool. For the model, this is a selection trap. When it has dozens of similar tools, it starts confusing scenarios, incorrectly choosing commands, and making mistakes in parameters. It becomes even worse where you need to go through a long chain of actions: find a user, remember the ID, create a task, then add an observer. A human will cope, but a model easily loses state halfway through the route.
- Clients implement authorization differently, so the same server behaves unpredictably.
- A large set of tools reduces the chance that the model will select the correct one.
- Long chains of calls increase the risk of confused IDs and incorrect parameters.
- Errors without explanation of the next step leave the model stuck.
- Overly large responses quickly consume context and break the dialog.
How to design tools
The author's conclusion is harsh but useful: an MCP tool should be designed not as a reflection of the database structure, but as an action understandable to the user. If a human needs to "assign a task to Ivan for tomorrow," the tool should be able to accept a name, deadline, and task text, rather than forcing the model to separately search for user_id, then task_id, and then link entities. The more self-sufficient the tool, the higher the chance that the model will execute the scenario without glitches and self-made orchestration inside the prompt.
"Tools should reflect user intent, not the structure of your database."
The second part of design is text. For a model, the description of a tool, its name, and the input schema fields effectively replace the interface. It doesn't see README, doesn't know the architecture, and doesn't understand the author's intentions outside the JSON schema. Therefore, formulations should be semantically dense: short but precise. The difference between `search_users`, `find_users`, and `lookup_user_by_email` for LLM is not cosmetic but behavioral. The same applies to errors: a good error message doesn't just report a failure, but hints at why it happened and what to try next.
Tests and protection
Classical unit tests are necessary here, but they are not enough. They check the code of the tool, not how exactly the model will select it and call it. Therefore, the article recommends evals: sets of user requests that let you see which tool was invoked, with what parameters, and how much the response matches the scenario. The problem is that model behavior is unstable and changes even between adjacent versions. What worked in one version of GPT or Claude may behave differently after an update, so manual testing in chat remains a mandatory part of development for now.
A separate section is devoted to security. MCP expands the attack surface from two sides at once: through the user with prompt injection and through the data that the server itself or external systems return. If a tool has extra privileges, the model will sooner or later try to do more than it should. The practical recipe is this: minimum privileges, filtering external content, explicit confirmation for irreversible actions, and limiting response size. The author recommends returning only necessary fields, a maximum of 10-20 records at a time, and always remembering that even a powerful model stops being useful when its context is cluttered with raw JSON.
What this means
MCP servers are quickly moving from experiments to production, and with it, the cost of errors in tool design is growing. The winning teams will be not those with more API methods, but those who know how to build short, understandable, and safe actions for the model, and then constantly recheck them on real scenarios.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.