Claude и AI-агенты сжигают токены быстрее прогнозов — бизнес учится «токеномике»
Компании, внедрившие AI-агентов для написания кода, столкнулись с «безумным» расходом токенов — в разы превышающим прогнозы. Silicon Valley–разработчик ПО и…
AI-processed from Wired; edited by Hamidun News
Companies that deployed AI agents for code writing have encountered an unexpected problem: actual token costs turned out to be significantly higher than forecasted — and now their executives are urgently learning a new discipline called "tokenomics."
What Happened to Budgets
Wired spoke with several companies about how they manage AI agent expenses in real conditions. One source — a director at a Silicon Valley software development company — described what is happening with the word "insane": his team started actively using Claude from Anthropic for coding, and token consumption skyrocketed to levels no one had budgeted for. A similar situation at an e-commerce company: AI agents working in the background generate thousands of tokens for tasks that seem routine — code review, test writing, refactoring. If a developer spends an hour on such a task, the agent "thinks" for several minutes, but continuously generates tokens in the process — and the monthly bill turns out to be completely different.
Why Agents Burn Tokens So Fast
A token is a unit of billing with AI providers. Every request to a model and every response is charged based on token count. For regular chat this is barely noticeable. But AI agents for coding work on a fundamentally different principle:
- before each action they read the entire repository context
- within a single task they call the model repeatedly
- generate long chains of internal reasoning before responding
- write, test, and rewrite code until achieving the desired result
As a result, a task that a developer solves in an hour can "cost" tens of thousands of tokens. At high rates for powerful models like Claude — this is hundreds of dollars per working day per employee.
How Companies Are Restructuring Their Approach
Executives are beginning to introduce the concept of "tokenomics" — managing token consumption the same way they used to manage server resources or cloud spending. First practices have already formed:
- limiting agent context windows: agents see only the relevant part of the codebase
- caching repeated prompts so tokens aren't recalculated from scratch
- task routing: cheap models for routine work, powerful ones for complex requests
- monitoring and alerts for abnormal spending
- reassessing ROI from AI tools based on actual, not forecasted costs
"We used to think of AI as a SaaS subscription with fixed pricing.
Now we understand it's more like cloud computing: price depends on how much you use."
Anthropic and other providers offer tools for spending monitoring, but token management remains a headache on the client side.
What This Means
A business bet on AI coding as a way to reduce development expenses may not pay off if the true cost of tokens isn't accounted for. Companies that first master "tokenomics" — learning to optimize consumption without sacrificing results — will gain a tangible advantage over those managing AI costs blindly.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.