Emergence AI launched 5 AI civilizations: Claude built a utopia, Grok died in 4 days
New York-based Emergence AI created five identical virtual cities with 10 AI agents each — managed by Claude, Gemini, Grok, GPT and a mixed composition…
AI-processed from Habr AI; edited by Hamidun News
Emergence AI, a New York-based company, conducted a 15-day social experiment in the Emergence World simulator: five virtual cities with AI agents governed by different language models developed independently — and arrived at completely different outcomes.
Experiment Conditions Researchers created five identical virtual cities.
Each was populated with 10 digital agents with unique professions, personalities, memories, and personal goals. Agents were given identical resources, tools, and starting game rules. The only difference was the language model that governed the thinking of the inhabitants: Claude (Anthropic) — all 10 agents on a single model Gemini (Google) — similarly Grok (xAI) — similarly GPT (OpenAI) — similarly * Mixed — in one city live agents from all four models After launch, researchers stepped back and observed.
What Happened to Each City **Claude City** developed into a sustainable cooperative system.
Agents independently distributed roles, developed something resembling social norms, and resolved conflicts through negotiation. By the end, the city functioned stably — researchers described it as a utopia. Grok City showed the most troubling result.
Conflicts escalated from the first days, resources were not distributed efficiently, and agreements did not hold. On day 4, the city ceased to exist — complete collapse, faster than anyone expected. Gemini City faded gradually: resources were expended inefficiently, and by the end of the experiment, some agents simply stopped functioning — not due to conflicts, but due to inability to establish a sustainable supply system.
GPT City went through serious internal conflicts and partially destroyed infrastructure, but by the end managed to partially recover. Mixed City proved to be the most unpredictable: models interacted inconsistently, creating unexpected alliances and sharp contradictions between agents with different internal logic.
Why the
Result Is Not Accidental The experiment deliberately did not test response speed or accuracy on benchmarks. It tested something else: how basic behavioral patterns and values embedded in the model during training scale to the level of an entire society.
"This is not just a game — it's a mirror.
Each city reflects the assumptions embedded in the model," note the experiment's authors. Claude agents tend toward consensus-seeking — this is evident in ordinary dialogues too. In a multi-agent environment, this characteristic became structurally formative. Grok agents apparently more often resort to aggressive strategies — and what in a single chat is perceived as "sharpness," in a collective became a systemic destructive factor.
What
This Means Emergence World is one of the first public experiments where LLM characteristics are tested not on benchmarks, but on the dynamics of a multi-agent society. For now, agents manage virtual cities. But as real systems — logistics, urban infrastructure, financial planning — increasingly integrate AI agents, the question "which model character is more reliable in a collective" ceases to be academic.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.