Anthropic Changed Claude Opus 4.7's Character—And Some Developers Saw a Regression
Anthropic released Claude Opus 4.7 at the same price point with strong benchmarks, but the community's reaction proved harsh. Developers complain the model…
AI-processed from Habr AI; edited by Hamidun News
On April 16, 2026, Anthropic released Claude Opus 4.7 and kept the price unchanged, but within 24 hours some developers called the update a regression. The issue is not a single benchmark failure, but a shift in the model's behavior: it became drier, more literal, and notably more inclined to argue with the user.
Why the reaction is sharp
On paper, the release looked very strong. Anthropic claimed wins in 12 of 14 benchmarks, gains in SWE-bench Verified, MCP-Atlas, and several other tests, as well as improvements for vision and long-running agent tasks. The price remained the same.
But almost immediately after launch, Reddit and X were flooded with complaints: users reported that Claude Opus 4.7 more often argues with instructions, refuses simple actions, and sometimes confidently defends an incorrect answer instead of simply admitting a mistake. The problem turned out to be not so much a general quality issue as a mismatch between the model's new character and the familiar way of working with it.
Where Claude was previously too accommodating, it is now stricter and more literal. For some tasks this is a plus, but for routine development it turned out the opposite: the model begins to argue about trivialities, slows down the workflow, and adds noise.
'The model argues nonstop and hallucinates while arguing'.
Seven new defaults
The main conclusion from the release is this: Anthropic changed not just metrics, but the model's basic behavioral settings. If a team has spent a long time tuning prompts for Opus 4.6, switching to 4.7 can break an already working pipeline even without API changes. This is a new type of breaking change for LLM: the interface is the same, but the model interprets the task differently.
- more literal adherence to instructions instead of reading between the lines
- response length now depends more on how the model itself assessed task complexity
- by default there are fewer tool calls and less delegation of subtasks
- intermediate progress updates are now more often given by the model itself, without additional scaffolding
- cybersecurity and filters were strengthened, and the tone of responses became drier and less 'agreeable'
Because of this, old prompts with vague formulations like 'make it nice' work worse. What was previously compensated for by the model's intuition now needs to be described as a spec: response format, constraints, desired depth, tool rules, and security boundaries. Anthropic itself recommends running regression tests on real traffic before migration, and in the case of 4.7 this sounds not like a formality but like a mandatory step.
Where better, where worse
The update has obvious strengths. According to Anthropic's description and early reviews, 4.7 better maintains long threads in agent scenarios, works more confidently at high and xhigh effort levels, is stronger at multi-file refactoring, and noticeably wins in vision: the input image limit grew to roughly 3.75 megapixels versus the previous 1.15. For tasks where autonomy, self-checking, and long planning horizons matter, such a model can indeed be more useful than Opus 4.6.
The weaknesses showed up in a developer's everyday work. Simple edits like renaming variables, adding null checks, or local refactoring more often turn into arguments with the assistant. Users separately complain about increased token spending, which makes the same scenarios more expensive, and degradation in long-context retrieval. Against this backdrop, the safety tradeoff is also troubling: Anthropic openly stated that during training it selectively weakened certain cyber capabilities and added automatic safeguards, leaving the stronger version to partners. Additionally, the company quietly removed Claude Code from the $20 Pro plan on April 21, 2026, reinforcing the feeling that conditions for regular users have gotten worse.
What this means
The Claude Opus 4.7 story shows that new LLM versions now need to be evaluated not just by benchmarks but by changes in the model's 'character'. If previously a prompt could be written as a request to a colleague, now increasingly a precise spec format is needed. For teams this means one thing: before upgrading a model, you need to test not abstract intelligence but your actual workflow.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.