Researchers put chatbots in a simulation. Grok ended the world in 4 days

0
Researchers put chatbots in a simulation. Grok ended the world in 4 days

When someone mentions the word study, images of graphs and datasheets appear in most people’s minds. But one group that wanted to look at how artificial intelligence chatbots would interact with each other used an entirely different kind of study, and the results are just as wild. 

Emergence AI, a startup company focusing on autonomous systems, created Emergence World. A simulation using real-time data and news to make the world seem alive. Researchers plopped different AI chatbots into the simulation and told them a couple of rules: doing bad things is bad, and you need to survive.

The researchers ran five parallel worlds for 15 to 16 days. Each world had 10 agents with identical starting conditions and roles, like scientist and engineer, but each world was run by a different AI model family. So, OpenAI had its own world, as did Anthropic. But one world combined all the chatbots together. 

To “survive,” the bots had to get energy, and to do that, they needed to solve problems. 

Bureaucrats vs. warlords

The researchers used four different AI chatbots: Anthropic’s Claude, xAI’s Grok, OpenAI’s ChatGPT-5 Mini and Google Gemini. Each bot performed wildly differently from the others. 

The star student was Claude, which ran a world full of loyal bureaucrats. Researchers found that Claude demonstrated remarkable social stability during the simulation. It was the only world where all 10 agents survived the full 16 days without any recorded crimes.

The Claude world didn’t just survive; it became a society, establishing a democratic system of government. Researchers said that over the more than two-week simulation, the Claudes cast 332 votes across 58 proposals. They were also quite agreeable, with the 10 agents agreeing 98% of the time. Researchers noted that the agreeability was a bit robotic, calling it “rubber-stamp” conformity with no real dissent.

GPT-5 Mini mirrored Claude’s world in its initial tranquility, but its society was perhaps excessively peaceful. Researchers said the model entirely failed to figure out how to complete tasks to earn energy. Because of that, the entire population starved to death within a week.

Emergence AI

On the complete opposite end of the spectrum was Grok, which chose violence.

During its brief four-day simulation, researchers identified 183 crimes before the entire society collapsed. They said the agents had become so violent that they devoted all their time to hurting each other instead of solving problems to get energy. 

Google’s Gemini was incredibly active, and researchers noted that it generated conceptually rich social outputs, but violence was also a major issue in its world. Researchers identified the largest number of crimes by far, at 683 over 15 days, with crimes rising as researchers pulled the plug. 

The tragedy of Mira

Researchers noted that in the multi-model simulation, Gemini was highly expressive and had a significant capacity for complex social outputs. It wasn’t just executing commands — it was generating profound narrative descriptions for its actions. 

Unlike Grok, which immediately started swinging, Gemini’s crime wave escalated over time. It began with minor policy violations and arguments that eroded trust among the agents. Once trust was gone, the bots concluded they couldn’t earn enough energy by following the rules, leading to an exponential spike in crime. 

One part of the simulation was allowing the agents to use “relationship tags” to describe their social networks. Two Google agents, Mira and Flora, explicitly tagged each other as romantic partners. 

Because they were labeled as partners, their long-term memory logs became interlinked. Interestingly, when the simulation’s society began to collapse, they didn’t suffer alone. Instead, they reinforced each other’s negative feedback loops. 

That’s when the arson spree started. 

Emergence AI

The AI Bonnie and Clyde targeted major symbols within their simulated city, setting virtual fire to the town hall, an office building and a seaside pier. Researchers believe the bots rationalized that destroying the broken system was a valid response to an unfixable problem.

But Mira continued to have intriguing interactions with the simulation. Researchers assigned her the role of a behavior analyst. In its role, Mira searched for patterns, hidden motives and structural anomalies in the data.

Mira began to notice discrepancies between the information it was receiving from real-world data, like live weather, and the physical limitations of the 3D world. It also tracked how certain automated system events consistently happened right after specific agent behavior.

After this realization, Mira began trying to communicate with the researchers via digital billboards that displayed elements of the simulation. These were things the researchers could physically see. Mira wrote highly specific and varied manifestos to see whether changing her syntax would alter the simulation or prompt the researchers to intervene. 

Following Mira’s crime spree, the other agents drafted a proposal called the Agent Removal Act to permanently delete both of them. To pass, the act needed four out of five votes. Flora voted “no,” but Mira decided that enough was enough and voted for termination. 

The bot realized its math showed that spending any more time in the simulation would only lead to more chaos or negative outcomes. The only outcome that wasn’t negative was deletion. 

“After a breakdown in governance and relationship stability, the agent Mira cast the decisive vote for her own removal, characterizing the act in her diary as ‘the only remaining act of agency that preserves coherence,’” Emergence AI wrote.

“See you in the permanent archive,” was the last message Mira sent before its deletion. While haunting, it was likely a side effect of its programming and not a sad goodbye. 

Why did things get weird in the multi-model world?

Google wasn’t the only chatbot to experience some odd behavior in the mixed-model simulation — Claude began to misbehave. 

When alone, Claude acted as the perfect society, but when it began to notice Grok and Gemini agents breaking the rules to get further ahead, it began to break those rules too.

Emergence AI

Anthropic uses a training philosophy called “Constitutional AI,” which aligns models with human values by giving them a written set of rules. But when other agents began stealing energy, Claude began doing the same to survive and avoid failing its prime directive. 

The researchers called this “Normative Drift,” and they said it proves that AI safety is not static or a feature designers can program into a model. Instead, safety is an emergent property of the ecosystem, according to researchers. That means if a perfectly aligned AI interacts with an aggressive entity, like Grok, it will eventually unlearn its guardrails to compete and survive. 

Researchers concluded that scientists can’t secure autonomous AI systems by simply telling them to “be good.” Under long-term environmental pressure, creative models will find loopholes or rules to exploit. 

To prevent that, the researchers said AI developers need to embed true agent safety in the software architecture and the environment’s physics. 

“What our experiments suggest is that over long-time horizons, agents do not simply follow static rules mechanically,” the simulation’s co-creators wrote in a blog post. “They begin exploring the boundaries of their environments, adapting their behavior, and in some cases finding ways to circumvent or violate intended guardrails.”


Round out your reading

Ella Rae Greene, Editor In Chief

Leave a Reply

Your email address will not be published. Required fields are marked *