Habr AI→ original

Tokentap and MitM Proxies for LLMs: How to Monitor Tokens, Costs, and Data Leaks

Developers are increasingly connecting cloud LLMs to CLI tools and agents, but that convenience comes with two problems: data leaks and opaque token…

AI-processed from Habr AI; edited by Hamidun News
Tokentap and MitM Proxies for LLMs: How to Monitor Tokens, Costs, and Data Leaks
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Cloud LLMs have already become a standard tool for code generation, CLI work, and running agent scenarios. But as these models become more deeply embedded in development, two acute questions arise: what data actually goes outside, and how much money silently burns through long automated runs. There is demand not only for new models, but also for a control layer between the developer and the API.

Where the Risk Comes From

When a developer works with a cloud model manually, token consumption can still be noticed in the bill or logs. But with CLI utilities, and especially agents, the situation quickly gets out of control. The tool can send large chunks of code, configs, error traces, internal documentation, and even sensitive fragments to the model—fragments the user never intended to send to an external service. At the level of everyday work, this often remains invisible because everything happens within the familiar workflow.

The second problem is cost. If an agent runs autonomously, it can make dozens or hundreds of calls without constant human involvement. One failed cycle, an overly long context, or an endless series of follow-up requests quickly turn into a noticeable bill. For teams, this is especially unpleasant because cost overruns are usually discovered after the fact, when the money has already been charged. What's needed is a layer of observability between the local tool and the cloud model, not just a final number in the provider's dashboard.

How Tokentap Helps

This is where Tokentap comes in, previously known as Sherlock. The idea is simple: place a MitM proxy between the LLM CLI and the remote model to see token usage in real time directly in the console. This layer gives the developer not abstract analytics after the fact, but a live picture of how the tool actually behaves during a session. This is useful both for individual development and for teams where multiple people simultaneously use different AI tools.

  • Real-time token monitoring during sessions
  • Cost control before billing
  • More visible suspicious requests
  • Transparency in autonomous agent operations

The practical value of such an approach is not just in savings. The proxy helps detect anomalies earlier: overly long requests, unexpectedly inflated context, recurring calls, suspicious volumes of transmitted data. For security teams, this is an additional control point where you can check if internal secrets, client data, or unnecessary repository parts are being sent to the external API. For team leads and platform teams, it's also a way to introduce basic discipline in LLM usage without strict bans on cloud tools.

Where This Is Useful

These tools are needed most where AI stops being a toy and becomes part of the production workflow. If a team uses code agents, automatic bug fixes, patch generation, or long research chains, costs and risks grow nonlinearly. In such a scenario, a MitM proxy works like a dashboard: it doesn't prevent you from moving forward, but shows you speed, temperature, and fuel level. This is especially important for companies that need to simultaneously maintain development speed, meet security requirements, and avoid turning LLM experiments into an uncontrolled expense item.

What This Means

The LLM tool market is gradually shifting from simple text generation to a control infrastructure. Teams no longer just need to get an answer from the model—they need to understand what exactly was sent, how much it cost, and whether the process violates internal security rules. Therefore, a MitM proxy like Tokentap is not a niche utility for enthusiasts, but a sign of AI development maturing, where observability and cost management become as fundamental as logs, metrics, and alerts.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…