How to teach a language model to write so it’s indistinguishable from a human newsroom
Habr published a breakdown of an interesting technical challenge: how to teach a language model to write not just good texts, but texts in the style of a specif
AI-processed from Habr AI; edited by Hamidun News
A prompt like 'write like a journalist' doesn't work. This is the first thing anyone discovers when trying to use language models to generate content for a specific media outlet. The text comes out smooth, grammatically correct, sometimes even engaging—but it doesn't sound like the intended publication. It sounds like ChatGPT pretending to be a journalist. A team of developers set out to solve exactly this problem, with their detailed technical breakdown appearing on Habr.
The post's author—Lena, for whom this is her first publication on the platform—describes the task with disarming honesty. The goal wasn't for the model to write 'well.' The goal was for the text to be indistinguishable from what was written by a specific editorial team: a specific city portal, a specific Telegram channel, a specific niche publication. The difference between these two formulations is a chasm that swallows most attempts to automate content.
Why is this even difficult? A publication's style isn't a set of rules that can be written into a system prompt. It's hundreds of implicit patterns: sentence length, frequency of colloquial expressions, preference for certain syntactic constructions, characteristic ways of starting and ending paragraphs, even typical 'imperfections'—like a specific editorial team's habit of overusing dashes or putting periods after every list item. Language models by default average all this down to some 'generically well-written text' that belongs to no one.
The naive approach—a detailed prompt describing the style—hits a ceiling almost immediately. You can write: 'use short sentences, conversational tone, start with a provocative question.' The model will dutifully follow instructions, but the result will be a caricature, not an imitation. It's like asking an actor to play 'a sad person'—they'll show you a stereotype of sadness, not a specific sad person. A stylistic prompt describes a genre, not a voice.
The next logical step is few-shot examples, where models are given several exemplary texts from the target publication directly in the request context. This works noticeably better, but creates new problems. The context window isn't infinite, and the more examples you load, the less space remains for the actual task. Moreover, the model starts copying specific phrases and facts from the examples rather than abstracting the style. It memorizes the surface, not the structure.
The truly working solution, which the team arrives at, lies at the intersection of several approaches. Fine-tuning on a corpus of the publication's texts allows the model to 'absorb' stylistic patterns at the level of weights, not context. But there are pitfalls here too: you need a sufficient volume of data, careful filtering is required, and—most interestingly—you need metrics that measure stylistic similarity, not just text quality. Standard benchmarks like perplexity or BLEU score are useless here. The team developed their own metrics, analyzing sentence length distribution, lexical diversity, frequency of stylistic markers, and other parameters that together create an 'fingerprint' of the publication.
This case is interesting not just as a technical challenge. It highlights a fundamental question about the future of media: if a model can be taught to imitate an editorial style indistinguishably, what does this mean for the very concept of editorial voice? On the one hand, it's a powerful scaling tool—a small editorial team can generate more content while preserving stylistic coherence. On the other hand, it blurs the line between authorship and imitation. If a reader can't distinguish a model's text from a journalist's text, who is the author?
There's a practical side too. The content market is already flooded with generic AI texts that all sound the same. Publications able to maintain a unique voice—even with the help of finely tuned models—gain a competitive advantage. The paradox is that technology that threatens to depersonalize content can become an instrument for preserving its individuality.
The publication on Habr is essentially open documentation of an approach that many media companies are developing behind closed doors. And it's precisely this openness that makes it valuable. The task of stylistic imitation will only become more complex as publications begin to demand of AI tools not just competence, but character. Those who learn to solve this problem systematically, rather than through endless prompt rewriting, will set the standard for AI content quality in the years to come.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.