Gemini 3.1 Pro outperformed ChatGPT 5.4 and Claude Opus 4.6 in a text generation test
Gemini 3.1 Pro won a comparison of text-generation models against ChatGPT 5.4 and Claude Opus 4.6. The author put all three systems through four tasks…
AI-processed from Habr AI; edited by Hamidun News
Gemini 3.1 Pro became the winner in an author's comparison of text generation models, surpassing ChatGPT 5.4 and Claude Opus 4.6. The gap was small, but the author called Gemini the most balanced tool for literary and emotional tasks.
How It Was Tested
The comparison was not built on code, search, or math, but on what remains the primary use case for AI among mainstream users: writing text. To test this, three models were given four assignments. They needed to create comedic science fiction, classic fantasy, psychological horror, and a short emotional story about a person lost in the forest. The maximum score for each round was three points, and the overall evaluation was based on genre fit, readability, appropriateness of details, and overall impression of the result.
The approach turned out to be subjective, but that is precisely its value. Such a test does not measure abstract intelligence, but rather shows how a model behaves in a real editorial task: does it maintain tone, preserve structure, avoid excessive verbosity, and can it convey emotion without unnecessary explanation. The author separately notes that some models tend to overload text, while others choose brevity. For generating posts, drafts, stories, and scripts, this is often more important than dry benchmarks.
Results by Model
Gemini 3.1 Pro showed the best result — 11.5 out of 12 possible points. It confidently handled the comedy assignment and performed better than the others on the emotional story about anxiety in the forest. Claude Opus 4.6 completed the test with 11 points, and ChatGPT 5.4 with 10. None of them failed: all three models demonstrated a high level overall and differed more in style than in quality.
- Gemini 3.1 Pro — 11.5 points; strong in genre fit and concise delivery
- Claude Opus 4.6 — 11 points; builds atmosphere well, but sometimes overloads text
- ChatGPT 5.4 — 10 points; stable, but occasionally makes stylistically questionable choices
- In the horror task, all three models received identical 2.5 points
- The author called the OpenAI model the most economical in price
Why Gemini Is Ahead
The main reason for Gemini's victory, according to the author, is balance. The model does not try to impress with answer length, does not oversimplify each scene, and does not lose the genre framework. In the comedy story, this showed as a livelier pace and effective humor, and in the emotional text — as clear escalation from denial to panic and despair. For content tasks, this is critical: if a model writes shorter but more accurately, it's easier for an editor to work with the text and spend less time polishing.
The competitors had their weak points. ChatGPT 5.4, by the author's observation, sometimes marks the story structure too explicitly — for example, emphasizes chapter climaxes, which makes the text lose its natural quality. Claude Opus 4.6, conversely, builds atmosphere well, especially in stories about isolation and paranoia, but at times becomes too elaborate and analytical. This doesn't completely break the quality, but it reduces the emotional impact, which in literary text should hit faster.
The overall winner was Gemini 3.1 Pro, although the other two participants also showed good results.
The author separately notes that they do not consider this result a universal verdict on the market. Model choice still depends on taste and task: for some, ChatGPT's clarity and predictability matters more, for others Claude's atmosphere is closer. Moreover, for basic text generation, in their opinion, free solutions like DeepSeek might be sufficient. But if we focus specifically on the combination of style, conciseness, and genre accuracy in this test, Gemini's leadership looks deserved.
What This Means
For editors, authors, and content teams, the conclusion is quite practical: you need to find not the "smartest" model in general, but one that better handles a specific format. In this comparison, Gemini 3.1 Pro turned out to be the most balanced option for text tasks, but the difference between participants is small. This means that in real work, the winner should be chosen not by a loud name, but by how many corrections need to be made after the first draft.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.