Claude Code and Codex: how to reduce token losses with three markdown files

Q: What is the source?

Originally published on Habr AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Apr 28, 2026. Reading time: 4 min.

The problem with coding agents is not just model prices, but blind navigation: they repeatedly traverse the disk, read unnecessary files, and even SSH for…

Hamidun News Editorial

AI monitoring · Habr AI

Apr 28, 2026· 3 min

AI-processed from Habr AI; edited by Hamidun News

Claude Code and Codex: how to reduce token losses with three markdown files — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

AI agents for development burn through context not because they answer poorly, but because they spend almost all their time searching for the right place in the code. Even with a window of a million tokens, they traverse directories again, re-read familiar files, and check servers as if seeing the project for the first time. An analysis shows that for a simple question about payments, the agent spent over 80 thousand tokens and over 15 tool calls, while the answer itself took about 800 tokens.

In other words, almost the entire budget went not to thinking, but to navigation. The problem turned out not to be a local quirk of Claude Code, but a general limitation of modern coding agents. Cursor, Codex, and Gemini CLI work the same way: without a workspace map, they start each new session with reconnaissance.

If there's one project, it's tolerable. But when a developer has dozens of repositories, VPS instances, and staging environments, the agent first greps the home directory, finds similar files in neighboring projects, reads them, then realizes it went the wrong way, and launches a new round of searching. In a real example, a question about payment methods in one bot turned into searching across multiple projects, re-reading six files, and even SSH-checking the server configuration.

Such a mode is not only expensive but also fragile: the model spends effort on orientation and easily misses relevant places. The author examines three popular approaches that are usually offered as a cure. The first is RAG and vector search.

It does a good job finding semantically similar fragments, but poorly understands project structure: it might return chunks with auth, login, and token, but fail to restore the exact chain of dependencies between middleware, refresh logic, and JWT configuration. Moreover, RAG requires separate infrastructure, an index, and reindexing, and each query adds latency. The second path is static analysis and dependency graphs through AST and tree-sitter.

This is useful within one repository, but almost useless at the level of a portfolio of projects, where you need to answer not just how a function works, but where the needed service actually lives, what server it's running on, and what its status is. The third option is to keep CLAUDE.md in each project.

This helps, but only after the agent has already figured out which project to go to. Instead, a hierarchical context is proposed that guides the agent from top to bottom. At the zero level lies a global map of projects: a short table with names, paths, servers, and statuses, which automatically gets into each session.

At the first level is CLAUDE.md in the root of a specific project with the stack, key files, deployment commands, service name, and logs. Between them, an intermediate layer can be added in the form of Graphify if the codebase is large and an accurate dependency graph is needed.

And as a third markdown layer, the author proposes storing past sessions and engineering solutions as markdown files with YAML frontmatter, so the agent can remember what was already discussed, what files were changed, and what solutions for debugging or payments were made a week earlier. The idea is simple: first the map, then the project description, then memory of past discussions, and only then the source code. Measurements show that such a scheme provides not cosmetic but practical gains.

For a question about project architecture, the blind agent needed 12 tool calls versus one with the hierarchy. For a question about which projects use a specific library, the blind mode made 44 calls, scanned the entire disk, and still missed one of the three needed projects; the hierarchy fit into two precise queries and gave a complete answer. In the case of deployment, the effect is even more noticeable: without structure, the agent read configs and went over SSH, but with a properly filled CLAUDE.

md it could answer directly from context without any additional calls. The important conclusion here is that a more organized context increases not only speed and token savings, but also answer accuracy. Why does this work better than the familiar RAG pipeline?

Because markdown files give the agent zero latency, predictability, and simple updates. The developer themselves determine what exactly is important to know about the project, rather than hoping the ranker will pull the needed chunks from the index. If the deployment changed or a service moved, it's enough to fix one line.

Scaling also looks reasonable: the project map takes about 2 KB, and fifteen project files of 5 KB each give less than 80 KB of structured context instead of hundreds of kilobytes of raw source code. Against the backdrop of talk about million-token windows, this is especially important: more tokens don't always mean better. Irrelevant information blurs the model's attention, and the lost in the middle effect hasn't gone away.

The main takeaway from the analysis is that the token problem in coding agents should usually be solved not by expensive models and not by complicating the stack, but by context discipline. A global map of projects, a good CLAUDE.md, and saved session memory can be assembled literally in ten minutes, and the payoff appears immediately: less blind searching, fewer repetitions, fewer errors, and a shorter path from question to the needed file.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation