Anthropic's Opus 4.7 consumes more tokens: hidden inflation at unchanged prices
In Anthropic's Opus 4.7, the same text is tokenized into more tokens than in Opus 4.6. Formally, prices and context limits remain unchanged, but in practice…
AI-processed from Habr AI; edited by Hamidun News
Anthropic released Claude Opus 4.7, and initial tests through the API revealed something unexpected: the same text in the new model can take significantly more tokens than in Opus 4.6. With formally unchanged pricing, this represents an inconspicuous but real increase in costs.
What's the Problem
A tokenizer is a component that breaks text into fragments before feeding it into the model. It's precisely the number of these fragments that determines the cost of a request and the volume of the context window occupied. When Anthropic changes the tokenizer between versions, the same prompt starts to "weigh" differently. In the case of Opus 4.7, the weight increased. If previously your typical request took 1,000 tokens, it might now take 1,300–1,500. The price per million tokens has not changed—but you're actually consuming more tokens for the same work.
"This creates hidden inflation: prices and limits are listed as before, but in practice costs can rise" — from research by the
Kodik team.
Kodik, a company developing a code editor with support for various AI models, independently tested the tokenizer behavior through the API. Official comparative data from Anthropic was not published, so the developers conducted their own tests and shared their findings.
Which Content Is Affected More
The token increase is not uniform across different types of text. Based on available data, the picture looks like this:
- Code — noticeable increase, especially in languages with many special characters: operators, brackets, indentation
- Technical strings (JSON, XML, YAML, SQL) — changes vary depending on structure; nested constructs may grow more
- System prompts — affected the same as user requests; for products with long system instructions this is particularly sensitive
- Plain text — moderate growth, less noticeable on short requests
- Mixed content (text + code + JSON) — behavior is unpredictable; worth testing for your specific scenario
It's important to consider scale: if a prompt grows by 20%, and you have a million requests a day, your actual bill will change very significantly.
Why This Is a Systemic Problem
The situation with Opus 4.7 is not an exception. The tokenizer can change in any model from any provider, and it doesn't always make it into release notes explicitly. For teams building products on top of APIs, this creates several risks.
Budget surprises. Limits calculated based on historical data can unexpectedly break after a model change—even if the requests themselves haven't changed.
Context window overflow. A system that previously fit within 128k tokens might start truncating context or returning an error after an update.
Unfair A/B tests. When comparing the quality of two model versions on the same data, different tokenizers mean the models receive technically different input—this affects interpretation of results.
Hidden regression in RAG pipelines. If you pack chunks by token limit, a tokenizer change can break your splitting logic without a single error in the logs.
What This Means
Developers already using Opus in production or planning to switch to 4.7 should run your actual prompts through the tokenizer API of both versions before switching. This will take a few hours but will allow you to honestly assess the cost increase and adjust your budget or architecture accordingly. The Opus 4.7 case is a good reminder: when upgrading a model, check not just answer quality but also tokenization efficiency.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.