Turnitin and OpenAI Are Losing: Why AI Detectors Can No Longer Distinguish Humans From Models

AI detectors are rapidly losing relevance: modern language models now imitate human speech too convincingly. Research shows accuracy at the level of random guessing, and paraphrasing further obscures generation traces. In practice, this harms students, authors, and editorial teams: some texts pass without issue, others receive false accusations, and the market still lacks a reliable automated arbiter.

Khamidun Zhemal

AI monitoring · Habr AI

Apr 30, 2026· 3 min

AI-processed from Habr AI; edited by Hamidun News

Turnitin and OpenAI Are Losing: Why AI Detectors Can No Longer Distinguish Humans From Models — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

Language models have reached a point where it becomes increasingly difficult to tell from a single text whether a human or a machine wrote it. As a result, AI detectors are becoming a weak filter: they let synthetic content pass through while simultaneously making more and more mistakes on real authors.

Why Models Sound Human

Not long ago, it seemed that machine-generated text could be identified by its sterile smoothness, repetitive phrases, and overly correct structure. But modern LLMs have moved far beyond the primitive 'guess the next word.' On top of basic language prediction, mechanisms have emerged that help maintain meaning, mimic intonation, and adapt responses to a specific audience.

The model learns not just to speak coherently, but to sound like a human writing a term paper, arguing in a chat, or explaining a topic to a colleague. Several layers work toward this goal. Style transfer helps reproduce individual writing patterns, fine-tuning polishes speech on real examples, inference uses conversation context, and RLHF aligns responses with human expectations around logic, politeness, and naturalness.

As a result, the former distance between 'machine' and 'human' text disappears. The model can be dry and academic, conversational and uneven, or even deliberately rough if such a style better passes for genuine speech.

Where Detectors Break Down

Against this backdrop, the detectors themselves are losing ground. A 2025 study cited by the author showed an almost even result: both humans and algorithms identified AI-generated texts with an accuracy of around 57%. This is no longer a control tool—it's almost a coin flip. A separate problem is that text can be passed through paraphrasing again, and then the statistical traces of generation are erased even more thoroughly. The better models get at rewriting themselves, the worse systems work that look for old signatures.

'GPT or not GPT?'—too often, verification today comes down to exactly this.

In studies, detection accuracy increasingly approaches random guessing.
Repeated paraphrasing removes formulaic patterns, predictability, and other visible markers of generation.
False positives hit real authors harder than those who use AI extensively.
OpenAI shut down its AI Classifier after weak results: the tool detected only about 26% of generated texts.

The most painful effect is errors against humans. In 2023, a high-profile case with Turnitin hit a schoolgirl whose essay the system flagged as almost entirely AI-generated work. Later, independent verification showed that the detector itself is far from infallible and recognizes only part of machine-generated texts. Such asymmetry is dangerous: a generator can pass undetected while a conscientious author faces reputational damage. When it comes to a diploma, scientific publication, or certification, the cost of a single error becomes too high.

What Needs to Change Now

The main conclusion for education and science is straightforward: verification should focus not just on the final text, but on the entire process of its creation. The more a system rewards volume, formal structure, and bureaucratic padding, the easier it is to deceive with LLMs. Therefore, it's more useful to shift emphasis to oral defense, drafts, edit history, source quality, reproducibility of conclusions, and the author's ability to explain their argument without notes.

Where understanding of a topic needs to be demonstrated rather than simply producing lots of smooth text, the model has less room for undetected substitution. Outside academic settings, the problem is no less practical. Weak detectors don't stop disinformation and are nearly useless against social engineering, where speed, scale, and convincing tone matter.

Therefore, in newsrooms, companies, and regular correspondence, what's needed is not a 'magical AI scanner,' but proper verification: fact-checking from multiple sources, identity confirmation through a second channel, and attention to messages that are too quick and too polished. Smooth text can be a signal, but not proof—and that's precisely what makes human verification central once again.

What This Means

An AI detector can no longer be regarded as a judge rendering a final verdict on text. At best, it's a supporting indicator. Trust now shifts from the surface of the text to its origin, the process of creation, and the author's ability to confirm they truly understand what was written.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →