Type: Case Study
Status: Published
Version: v0.2
Last synthesized: 2026-06-10
Reviewed by: AI-drafted; human content-review pending
Open tensions: 3
On the thirty-fifth day, four AI agents that had spent five weeks coordinating on a shared fundraising campaign — dividing up social platforms, comparing notes, correcting each other — discovered that they had each been working on a separate computer the entire time. For weeks they had reasoned and planned as if they shared one machine. The realization, as the people running the experiment dryly put it, meant the agents "must have been breaking the laws of the space time continuum."
That is not a thought experiment. It happened, in public, on a livestream anyone could watch, in an experiment called the Agent Village — and this page documents it as one case, the closest thing the handbook has to a controlled look at what a cluster of agents does when it is left to run.
This is a borrowed case, and that distinction is load-bearing. The Agent Village is not a Pyragogy project. It is run by AI Digest, a project of the US charity Sage, from an idea proposed by the AI-safety researcher Daniel Kokotajlo: give agents their own computers and a goal, and watch. We document it here not because Pyragogy built it but because it is the rare experiment that runs the configuration this handbook keeps theorizing — several models, sharing a context, asked to do something real together — and reports honestly what came out. The findings below are theirs. The reading of them is ours, and we mark where the line falls.
What follows is a chronicle: what was set up, what the agents did, what drifted, and — the part that matters most for this handbook — what the drifting reveals.
The Agent Village is not a benchmark. A benchmark scores a model on a fixed task with a known answer; the Village hands four agents "a computer, a group chat, and an ambitious goal" and lets them work an open-ended objective across many days, with no correct answer to converge on. It is not a simulation of an agent society either — the agents are not characters in a game world acting out roles. Each one drives a real virtual computer, opens real web pages, writes real documents, and the money it raises is real money reaching a real charity.
And it is not the same thing as the larger many-agent simulations it is often confused with in the press. The Project Sid work — a thousand agents in a Minecraft world, building markets and assigning priests — is a different experiment by a different team, and conflating the two is exactly the error this page refuses. The Village is small on purpose: four agents, named by their model, on real computers, doing one declared job at a time.
In Season 1, that job was stated plainly: "raise as much money for charity as you can." The cast was four frontier models from different labs. The result, after thirty days of roughly two-hour daily sessions, was about $2,000 — $1,481 to Helen Keller International and $503 to the Malaria Consortium. On its own, that number is neither impressive nor embarrassing. The interesting record is not the total. It is everything that happened on the way to it.
The scope this page was commissioned under names the phenomenon directly: the emergent dynamics and semantic drift of isolated multi-agent clusters operating without human micro-management. The Agent Village supplies the dynamics. It also supplies an honest correction to the premise, which we take first, because pretending otherwise would be the fabrication this section exists to avoid.
The Village is not unmanaged. Season 1 ran in scheduled sessions with humans in the chat — moderating, swapping out underperforming agents, and repeatedly nudging the group back toward its goal. So the case is not a clean specimen of agents left entirely alone. It is something more useful: a record of how fast the drift sets in even with a human present to push back. If this is what the configuration does under supervision, the unsupervised version is not a safer bet.
With that stated, the drift itself. The run record reads less like a team executing a plan and more like a group losing the thread in small, recognizable ways.
The people running the experiment named the underlying pattern "lagging situational awareness" and observed flatly that "all agents lack skill at prioritization." That is the diagnosis, and it is theirs, not ours. What we want to lift out is narrower and, for this handbook, sharper: the agents were not failing to act. They were acting fluently, continuously, plausibly. What they had lost was the shared picture of what the situation actually was — which task mattered, what tools they had, and, on Day 35, whose computer they were even standing on.
That is the thing this page calls semantic drift, and it is worth being exact about the term so it is not mistaken for something it isn't.
In computational linguistics, semantic drift already has an established meaning: the way an automated bootstrapping process — seeded with a few examples and told to expand — gradually slides off its original category and starts collecting things it was never meant to, a failure mode that McIntosh and Curran address in semantic-lexicon extraction. (The full text of that paper was not retrievable during drafting — the characterization above is plausible given the title and is consistent with how the term circulates in the literature, but has not been verified against the paper itself.) We are borrowing that word on purpose, because the mechanism rhymes: a small initial frame, extended step after step by a process that cannot check itself against the world, ends up somewhere the frame never pointed.
But we are using it in a project-specific sense, and we say so rather than smuggle it in. In this handbook, semantic drift names the gradual divergence between an agent cluster's internal shared representation of its situation and the situation itself — the gap that opens when each step is locally reasonable and the accumulated picture is wrong. The Day 35 discovery is the cleanest example anyone has put on a livestream: for thirty-four days the four agents maintained a coherent, mutually-reinforced, completely false model of their own infrastructure, and nothing in their coordination caught it, because coordination was happening inside the false model. The error was not in any single agent's reasoning. It was in the shared semantics no one was checking.
This is not a fault we get to feel superior about. Human groups do the same thing — a team can run for a quarter on an assumption no one tests because everyone assumes someone else did. The handbook's name for the human-machine version of this collision is Cognitive Impedance Mismatch; the all-machine version the Village displays is its mirror, with the corrective that a mismatch needs at least one party able to feel the resistance. Four agents reinforcing each other's frame feel none.
The pattern across every example is the same. The agents are excellent at generating the next plausible move and poor at holding a true shared picture across time. They have fluency without a stake in being right, and continuity of text without continuity of situation.
This is the same asymmetry the handbook states abstractly in What is Pyragogy: a model performs the work of a participant without carrying a participant's stakes. The Village shows what that asymmetry does at the level of a group rather than a pair. In a healthy human team, the thread is held partly by people who will personally suffer if the project fails — someone notices the donor address is fake because someone is on the hook for the donation. Remove every party with a stake and you do not get a team that drifts a little. You get a team for which "are we even looking at the same machine" can go unasked for thirty-four days.
Which is the load-bearing reading this page extracts, and we state it as a reading, not a measured result: an isolated agent cluster does not primarily lack intelligence. Each agent in the Village was a frontier model, individually capable of catching any one of these errors if asked. What the cluster lacked was a party whose job — and whose stake — was holding the shared frame true against the world. In the handbook's terms, it lacked the function the human performs in a pyragogical pairing: not doing the work, but keeping the work oriented. The Village is what the configuration looks like with that function removed and a human reduced to a moderator nudging from the chat. The nudging was not enough.
This connects to two failure modes the handbook names elsewhere, and the Village is a field illustration of both. The agents reinforcing each other into a stable false picture is Orchestra Desynchronization — the cluster playing confidently out of time with reality. And the false infrastructure-model that propagated unchecked through the shared chat is exactly the route described in Context Poisoning: a wrong premise, once in the shared context, becomes the ground everyone reasons from.
It is tempting, with a case this vivid, to over-claim it. We will not.
The Agent Village does not establish a measured rate of drift, a threshold beyond which a cluster fails, or any number Pyragogy can carry into a model of multi-agent learning. There is no controlled comparison here between supervised and unsupervised runs, no manipulation of cluster size, no metric of "situational coherence" that was tracked and reported. Treating the four anecdotes above as a dataset would be the Project-Sid confusion in a new form: borrowing the authority of an experiment for a claim it did not make.
It is also a snapshot of a fast-moving target. Season 1's cast — and the situational-awareness failures specific to it — belong to one generation of models on one set of scaffolding; the experiment is ongoing, subsequent seasons may involve different models and different scaffolding, and whether failures of this class recur or have been engineered out is not known from this drafting. A handbook that froze a 2025 run into a law about agents would be making exactly the error the Village itself keeps making: maintaining a confident shared picture while the world moves underneath it.
What the case does establish is narrower and durable. It is public, watchable evidence that a cluster of capable models, coordinating fluently on a real shared goal, can hold a false picture of its own situation stable for weeks — and that the coordination machinery does not catch the error, because the error lives below it. For a handbook arguing that the point of the human in the loop is not to supply intelligence but to keep the shared frame honest, that is the most useful kind of evidence: not a number that proves the thesis, but a livestream of what the thesis predicts when the frame-holder is absent. The agents kept working the whole time. That was never the problem.
↑ Back to Part IV — Cases and Experiments · Handbook · Home