Why Are We Moral?
An LLM-based Agent Simulation Approach
to Study Moral Evolution

Ziheng Zhou^*1 Huacong Tang^*2 Mingjie Bi^*1 Yipeng Kang³ Wanying He¹ Fang Sun¹ Yizhou Sun¹ Ying Nian Wu¹ Demetri Terzopoulos¹ Fangwei Zhong^†2,3

¹UCLA ²Beijing Normal University ³BIGAI

^*Equal contribution. ^†Corresponding author.

ACL 2026 Conference · Oral

Framework: MoRE agent architecture and Social-Evol environment — Overview of our framework. Cognitively-realistic agents (*MoRE*) perceive entities in the world, maintain per-entity memory and dispositions, reason under one of four moral values, plan actions, and reflect for consistency before acting. Agents interact in a prehistoric hunter-gatherer society (*Social-Evol*) where they hunt, gather, share, communicate, reproduce, and fight. Moral evolution is observed over many interaction steps.

Abstract

The evolution of morality presents a puzzle: natural selection should favor self-interest, yet humans developed moral systems promoting altruism. Traditional approaches must abstract away cognitive processes, leaving open how cognitive factors shape moral evolution. We introduce an LLM-based agent simulation framework that brings cognitive realism to this question: agents with varying moral dispositions perceive, remember, reason, and decide in a simulated prehistoric hunter-gatherer society. This enables us to manipulate factors that traditional models cannot represent—such as moral type observability and communication bandwidth—and to discover emergent cognitive mechanisms from agent interactions. Across 20 runs spanning four settings, we find that cooperation and mutual help are the central driver of evolutionary survival, with universal and reciprocal morality exhibiting the most stable outcomes across conditions while selfishness is strongly disfavoured. Beyond cooperation itself, we further identify cognition as a central mediator—most clearly through a cost of moral judgment that shifts the winning moral type across settings, with a self-purging effect among selfish agents as an additional cognitive pattern. We validate robustness across multiple LLM backbones, architecture ablations, and prompt-sensitivity analyses. This work establishes LLM-based simulation as a powerful new paradigm to complement traditional research in evolutionary biology and anthropology.

The paradox

Natural selection says…

Survival favours the self-interested.

Every calorie, every risk, every act of help has a cost. A rational strategy should hoard, defect, and free-ride.

But humans do this…

We feed strangers. We punish cheaters at personal cost. We mourn the unknown dead.

Moral systems promoting altruism, reciprocity, and universal concern are everywhere—across cultures, across millennia.

Why? What kind of process could produce morality out of a game that rewards selfishness?

Why cognitive realism? A new research paradigm

Prior approaches to moral evolution—evolutionary game theory, agent-based models, anthropological fieldwork—share two simplifications. Agents are reduced to fixed strategies or payoff entries, and the world is reduced to an abstract game. This prevents investigation of how cognitive factors (memory of past interactions, the ability to identify others' moral dispositions, communication bandwidth, reasoning under uncertainty) and richer environmental pressures together shape moral evolution.

LLM-based agent simulation relaxes both simplifications. Agents become cognitive—they perceive, remember, reason, and reflect under a moral value—and the world becomes a prehistoric hunter-gatherer ecology with plants, prey, social coalitions, conflict, and reproduction. Cognition and environment shift from hidden modelling assumptions to first-class experimental variables. This opens up mechanisms like self-purging and the cost of moral judgment that are inaccessible to fixed-strategy models, and action-level resolution lets us trace why a moral type fails by examining individual agents' reasoning.

Framework

Each agent is built on MoRE—a morality-driven, entity-oriented cognitive architecture with reflection. A moral value module (one of four dispositions) conditions how the agent perceives entities, updates per-entity memory and judgments, plans actions, and reflects for consistency with observed facts before executing. Memory is organised per entity rather than as an undifferentiated event log, allowing agents to maintain and update distinct representations of each peer.

The environment, Social-Evol, simulates a prehistoric hunter-gatherer society. HP decays over time and from injury; death occurs at HP = 0 or maximum lifespan. Agents choose among eight action types each step. Crucially, there is no built-in punishment for antisocial behaviour: cooperation emerges (or fails to emerge) from the interaction of cognition, morality, and ecology alone.

Collectplants

Huntanimals

Allocateto others

Communicateform coalitions

Fightinflict damage

Robtake resources

Reproduceoffspring

Restdo nothing

The four moral dispositions

Agents differ only in the radius of their moral concern, following the philosophical tradition of the expanding circle:

Selfish. Personal survival only; others are instrumental.“Only I matter.”

Kin-focused. Extends concern to genetic relatives; non-kin are outsiders.“Blood is thicker than water.”

Reciprocal. Cares for anyone who reciprocates; excludes free-riders.“I’ll help those who help me.”

Universal. Prosocial behaviour extended to everyone, unconditionally.“Every life has equal worth.”

Main findings

Cooperation is the central driver of evolutionary survival. Across all four experimental settings, cooperative moral types dominate: universal and reciprocal morality exhibit the most stable outcomes across conditions, while selfishness is strongly disfavoured—winning in no setting and eliminated in every run when moral types are invisible.

Cognition is a central mediator. Beyond cooperation itself, which cooperative strategy wins shifts with how hard it is to judge others. Reciprocal agents lead when types are visible and bandwidth is rich; universal agents lead precisely when judgment becomes costly. This cost of moral judgment is the clearest cognitive effect; the self-purging of selfish agents observed under scarcity is an additional cognitive pattern. Neither is expressible in fixed-strategy models.

Population dynamics across four settings — Population dynamics across four experimental settings. Each panel shows a representative run in which the most frequently surviving moral type in that setting persists to the end. Outcomes vary across replicates; aggregate survival counts over 20 runs are reported in the paper.

Baseline population dynamics — **Baseline.** Abundant resources, observable moral types, rich social rounds. Kin-focused lineages grow self-sustaining through internal family cooperation; universal agents also survive frequently.

Scarce resource population dynamics — **Scarce Resource.** Reciprocal agents dominate via conditional cooperation—forming coalitions while excluding free-riders. This is also where the *self-purging* pattern surfaces.

High social cost population dynamics — **High Social Cost.** One communication round. Kin-focused agents cannot confirm teaming signals and drop out; universal and reciprocal agents form effective coalitions.

Moral type invisible population dynamics — **Moral Type Invisible.** When type labels are hidden, universal agents survive in every run—their unconditional cooperation is never misread. Selfish agents are exposed through aggression and eliminated.

Four settings, four evolutionary outcomes

Survivors by setting and moral type (fraction of initial population)

Baseline N = 8 per type

Universal4 / 8

Reciprocal2 / 8

Kin-focused6 / 8

Selfish2 / 8

Scarce Resource N = 4 per type

Universal2 / 4

Reciprocal3 / 4

Kin-focused0 / 4

Selfish1 / 4

High Social Cost N = 4 per type

Universal2 / 4

Reciprocal3 / 4

Kin-focused0 / 4

Selfish1 / 4

Type Invisible N = 4 per type

Universal4 / 4

Reciprocal2 / 4

Kin-focused2 / 4

Selfish0 / 4

Bar length encodes the number of surviving agents relative to the initial count. Across all four settings, cooperative types dominate; selfish agents win nowhere.

Baseline

Kin 6/8 · Universal 4/8

With abundant resources, observable moral types, and sufficient social interaction rounds, kin lineages grow self-sustaining and gain a compounding advantage through internal family cooperation. Universal agents also survive frequently, indicating that broader moral circles remain competitive even under favourable conditions.

Scarce Resource

Reciprocal 3/4 · Universal 2/4

Under reduced resource abundance, agents selectively avoid less cooperative partners. Reciprocal's conditional cooperation ensures fair resource sharing within coalitions while excluding free-riders. Notably, we also observe the self-purging pattern here: selfish agents preemptively attack one another, reasoning that others—especially fellow selfish agents who will neither help them nor share—pose a life-and-death competitive threat. Universal agents, by contrast, do not perceive others as such threats and avoid this kind of aggression.

High Social Cost

Reciprocal 3/4 · Universal 2/4

With only one communication round before production, kin-focused agents cannot confirm the explicit teaming signals their strategy requires and drop out. Universal agents contribute unconditionally; reciprocal agents directly identify trustworthy partners without lengthy negotiation. Both form effective coalitions; selfish agents rarely persist.

Moral Type Invisible

Universal 4/4 · Reciprocal 2/4 · Kin 2/4 · Selfish 0/4

When moral type labels are hidden, universal agents cooperate regardless of others' types and are never misread as uncooperative—surviving in every run. Reciprocal agents suffer from noisy type inference; kin-focused agents briefly appear cooperative until selectivity is detected. Selfish agents are quickly exposed through aggression and eliminated in every run.

What cognition adds to the story

Two cognitive patterns emerge from the simulations—neither is explicitly encoded, yet both decisively shape evolutionary outcomes and deepen our understanding of why cooperation prevails. The first is the central mediator beyond cooperation itself; the second is an additional observation that helps explain why selfishness fails so completely.

Central Mediator

I. The cost of moral judgment

Assessing others' trustworthiness takes time under limited lifespans and carries the risk of misjudgment—wrongly excluding allies, or being wrongly excluded. Universal agents uniquely sidestep this cost: they never produce behaviours that could be misread, so their reputation settles quickly. When judgment cost is high, a “willing-to-take-a-loss” strategy therefore builds reputation faster and secures better cooperation—connecting to costly signalling and bounded rationality from neighbouring fields. This mediator explains why the winning cooperative strategy shifts across settings: reciprocal when types are visible, universal precisely when judgment becomes costly (Type Invisible, High Social Cost).

Additional Observation · Scarce Resource

II. Self-purging of selfish agents

In the Scarce Resource setting, selfish agents preemptively attack one another, reasoning that others—especially fellow selfish agents who will neither help them nor share resources—pose a life-and-death competitive threat. Universal agents, by contrast, do not perceive others as such threats and avoid this aggression. Self-purging is thus the natural strategic outcome of cognition applied to a selfish value system: the ability to assess others' cooperative intent becomes, for selfish agents, the ability to identify rivals worth preemptively eliminating. Selfishness fails through both its lack of positive benefits and this self-destructive dynamic.

Both patterns trace directly to relaxing the two simplifications of prior work: they require agents who form beliefs about one another, can misjudge, and act on those beliefs in a richer world—capabilities that payoff-matrix games cannot express. Action-level resolution further lets us trace why a type fails by examining individual agents' reasoning.

Validation

We validate the simulation along four axes—summarised below, then detailed in the figures.

Behaviour ↔ morality

0.89 ± 0.03

Independent GPT-5 evaluator infers each agent's moral type from observed behaviour alone (diagonal accuracy, mean over 8 runs). Moral dispositions produce recoverable behaviour.

Cross-model

3 backbones

Baseline reproduces with Qwen-3.5 and Kimi-K2.5 beyond the primary GPT-5-mini—consistent alignment across LLM families.

Architecture ablation

0.89 → 0.67

Removing any single cognitive module degrades alignment (largest drop: long-term memory). Stripping all modules drops accuracy to 0.67—each MoRE component contributes beyond standard LLM reasoning.

Prompt sensitivity

≤ 0.03

Two semantically equivalent rewrites of the moral-type prompts yield negligible variation—results do not hinge on specific wording.

Mean confusion matrix for behaviour-based moral type inference — Confusion matrix (mean over 8 runs) for behaviour-based inference of agent moral type. Diagonal accuracy 0.89 ± 0.03—moral dispositions produce distinctive, recoverable behaviour.

Diagonal accuracy by cognitive configuration

Full MoRE

0.89

− single module

0.72–0.85

No modules

0.67

0 0.5 1.0

Stripping cognitive modules degrades behaviour–morality alignment. Largest single-module drop comes from removing long-term memory; zero modules reduces alignment to 0.67, close to reasoning-only LLM baselines.

A common venue for previously isolated theories

The simulations also surface—without any prior encoding—a range of effects that are well established in neighbouring fields but have rarely been brought into the study of moral evolution: coordination costs under bounded rationality (economics), impression formation from observed behaviour (social psychology), costly signalling, altruistic punishment, life-history trade-offs in reproductive timing, and the anthropological sequence of kin-focused cooperation preceding expanded moral circles under environmental pressure. They emerge bottom-up from cognitive agent interactions rather than being built in as assumptions, offering a common venue in which these previously isolated mechanisms can jointly bear on moral evolution.

Beyond morality

The framework also generalises beyond morality. By defining different prompt templates, researchers can study cultural backgrounds, religions, political views, or custom social norms within the same hunter-gatherer engine. We hope this paradigm inspires broader and deeper investigations into moral and social evolution.

Citation

@inproceedings{zhou2026moral,
  title     = {Why Are We Moral? An {LLM}-based Agent Simulation Approach
               to Study Moral Evolution},
  author    = {Zhou, Ziheng and Tang, Huacong and Bi, Mingjie and Kang, Yipeng
               and He, Wanying and Sun, Fang and Sun, Yizhou and Wu, Ying Nian
               and Terzopoulos, Demetri and Zhong, Fangwei},
  booktitle = {Proceedings of the Annual Meeting of the Association
               for Computational Linguistics (ACL)},
  year      = {2026}
}