SimpleOne showed why Claude and AI reviewers can’t distinguish AI-generated code from human-written code
SimpleOne highlighted an uncomfortable fact for teams already writing with Claude and reviewing through AI tools: from style alone, it is increasingly hard…
AI-processed from Habr AI; edited by Hamidun News
SimpleOne offered a simple test: distinguish human code from AI-generated code. It turned out that this is difficult not only for developers, but also for the AI-reviewer itself, which checks code according to the same patterns by which it was created.
Why It's Difficult
The article shows the team collected several pairs of functions: email validation, API request, configuration loading, and discount calculation. In each pair, one version was written by a human, the other by a model. On paper, the task looks simple: AI code usually has more typing, comments, and pseudo-neatness.
In practice, it's worse. Some examples do indeed give themselves away through excessive correctness, but by the third or fourth pair, it becomes clear: you can't always distinguish the author by style alone. The strongest example is the discount calculation function, where the difference between versions almost disappears.
Both fragments are functional, both pass static checks, both look reasonable. But the real error hides not inside the function, but at the place where it's called. If the discount should be fixed at the moment of order placement, recalculating it every time the page is opened cannot be done.
This is no longer a question of syntax or best practices, but of business logic and project context, which an automatic reviewer usually doesn't have.
What Gives Away the Synthetic
SimpleOne highlights several signs by which AI code can sometimes still be detected. Not about gross errors, but rather about suspiciously "perfect" style: the code seems neat, but behaves as if trying to impress the reviewer rather than solve a specific production task. Such signals most often appeared in the examples with email, API, and config.
- Overly detailed docstrings with enumeration of Args, Returns, and each exception even for a tiny function.
- Unnecessary edge cases and checks that no one asked for and that don't follow from the task statement.
- Invented entities around the code: custom exceptions, loggers, or required fields that don't exist in the project.
- "Safe" fallbacks like data.get('user', {}), which don't fail immediately but instead mask the real problem.
From this comes the main risk: AI often writes not overtly bad code, but code that looks convincing. It's typed, formatted, commented, and accounts for many scenarios. But some of these scenarios are made up, and some of the protective constructs only complicate debugging. Such code is easy to accept as high-quality if you look at the form and don't check how it's integrated into a specific system.
Where Review Breaks
The problem that SimpleOne describes doesn't come down to the quality of one model. If Claude generates code and then another AI tool checks it according to the same patterns, the team gets a closed loop of synthetic confidence. Everything looks green: there are types, there's error handling, there are comments. But review might miss that UserFetchError doesn't exist in the project, secret_key suddenly appeared without requirements, and a "silent" return of an empty dictionary will later consume hours searching for a bug.
"The AI-reviewer found 2 out of 5."
The team tested this on real bugs from production. The automatic reviewer caught only two simple cases: an unused variable and a missing null check. What it missed were things that required project context: a logical error in the calculation, a race condition with parallel requests, and an incorrect migration order. The conclusion is practical: AI works well as a first filter, but doesn't replace a human in critical areas where understanding domain logic and the consequences of changes is important.
What This Means
Widespread use of AI in development changes not only the speed of code writing, but also the quality control process itself. Checking generated code with another generative tool is useful, but not sufficient. If the team doesn't test the reviewer against old bugs and doesn't keep a human in the decision-making chain, it risks getting not quality, but a very convincing imitation of it.
Need AI working inside your business — not just in your newsfeed?
I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.