The Society of Thought: When a Field Station Director Beat Google to the Punch
This morning, while browsing my morning reading over coffee—articles including the brilliantly curated collection by Abbas Raza at 3 Quarks Daily—I encountered a hundred-page research paper from Google's Paradigms of Intelligence Team. "Reasoning Models Generate Societies of Thought" arrived with the weight of institutional authority: authors from Google, the University of Chicago, and the Santa Fe Institute; sophisticated mechanistic interpretability methods; structural equation models; analysis of 8,262 reasoning problems across six benchmark suites. The paper's central claim is genuinely fascinating: that advanced AI reasoning models don't succeed by thinking harder or longer, but by spontaneously simulating multi-agent dialogue—internal debates among distinct personalities that argue, question, and reconcile their way toward better answers.
My reaction, I confess, was less than gracious. I had built exactly this system nearly a year ago.
Let me explain what the Google team discovered, and then I'll tell you about my own version—a system I call SOCRATES that I assembled in February 2025 with nothing more sophisticated than PHP, MySQL, and a locally-hosted language model.
The research paper documents something remarkable about how state-of-the-art reasoning models like DeepSeek-R1 actually work. When these models are trained through reinforcement learning—rewarded only for getting correct answers—they spontaneously develop what the authors call "conversational behaviors." The models begin asking themselves questions and answering them. They shift between perspectives. They generate conflicts between viewpoints and then reconcile them. Most strikingly, when the researchers used interpretability tools to peer inside the models' reasoning traces, they found distinct "personas" with measurably different personality traits arguing with each other.
This isn't metaphor. The researchers applied the Big Five personality assessment to the implicit voices within reasoning traces and found genuine variation in extraversion, agreeableness, neuroticism, and openness. The successful reasoning models had developed internal diversity—a parliament of perspectives that debate their way toward truth.
The theoretical implications are significant. The paper connects to Hugo Mercier and Dan Sperber's "argumentative theory of reason," which proposes that human reasoning evolved not for solitary truth-seeking but for persuading others and evaluating their arguments. Reason, in this view, is fundamentally social—it works best when there's someone to argue with. The Google team suggests that AI systems, when optimized purely for accuracy, independently converge on the same solution: simulate a society of minds.
This brings us to Marvin Minsky, who proposed in 1986 that intelligence itself emerges from the interaction of many simple agents, none of which is intelligent on its own. His "Society of Mind" was a theoretical framework, a way of thinking about cognition. What the Google paper documents—and what I stumbled into building—is that this framework can be made literal.
For the past couple of years I have been experimenting with small language models that can run on my laptop or Mac mini server. They require much less memory and compute, so little, in fact, that you can simultaneously run multiple versions or multiple models in parallel and code them to interact. I use Ollama primarily to orchestrate these. What I noticed immediately was that these locally-hosted models, while impressive, weren't as capable as the large foundation models from Anthropic or OpenAI. A single Ollama instance running Gemma or Llama felt like a bright undergraduate—eager and knowledgeable, but prone to confident errors.
This observation led to a simple question: if one LLM isn't particularly smart, what happens if you put several of them in a room together?
I spent a day building SOCRATES—a system that orchestrates multiple Ollama instances into structured debates. The architecture is almost embarrassingly simple: a PHP backend that manages conversation state, a personality system that lets me define distinct cognitive styles through prompt templates, and a JavaScript frontend that displays the evolving dialogue. Each personality gets its own system prompt defining its approach, its biases, its characteristic moves. When I pose a question, SOCRATES passes it to each participant in turn, with the full conversation history, so they can respond to each other's arguments.
I built a personality management system—a template for defining any cognitive style I wanted, saved to a library I could draw from. The interface lets me select up to eight personalities to participate in any given debate. My current roster includes the predictable (Scientist, Synthesis Expert, Zen Master) and the decidedly less predictable: a Monty Python Philosopher, a Pragmatic Pirate, Professor Farnsworth from Futurama for dialectical reasoning, and Louise Belcher from Bob's Burgers as the resident skeptic—a tribute to my granddaughter, who loves the character so much she sometimes wears the bunny ears. For the cosmology debate this morning, I chose three: Spock for analytical rigor, a Zen Master to challenge premises, and Leonardo da Vinci for systems thinking.
Here's a fragment of what they produced when I asked about the origins of the universe:
Spock began systematically: "To address this, we must first define 'origin.' Does it refer to the initial conditions of the observable universe, or to a causal explanation for why anything exists rather than nothing? Physics addresses the former with considerable precision. The Big Bang model, supported by cosmic microwave background radiation, describes the universe's evolution from approximately 13.8 billion years ago..."
The Zen Master interrupted the frame entirely: "To ask if physics or religion explains origin is to already assume an 'origin' to explain. But consider: the Big Bang model traces backward to a singularity where time, space, and physical law cease to apply. At that boundary, 'before' has no meaning. Physics does not explain the origin—it describes evolution within a context that had no prior context..."
Leonardo synthesized: "Both of you illuminate facets of this. Spock, your emphasis on empirical precision anchors us; Zen Master, your challenge to the frame prevents premature closure. I propose they are complementary lenses, not competing absolutes..."
This is exactly what the Google paper describes—question-answering sequences, perspective shifts, conflicts, and reconciliation—except I built it with explicit agents rather than discovering it emergent within a single model. The outcome is strikingly similar: better reasoning through structured disagreement.
The Google team's contribution is demonstrating that this pattern emerges spontaneously when you optimize for accuracy alone. They didn't train models to debate; the models invented debate because it works. Their mechanistic interpretability analysis—using sparse autoencoders to identify features in the models' activation space—shows that "steering" models toward more conversational behavior directly improves their reasoning accuracy. The society of thought isn't a stylistic choice; it's a functional architecture.
What amuses me about all this is the divergence in methods. Google deployed teams from three institutions, ran experiments across 8,262 problems, used structural equation modeling to decompose behavioral pathways, and produced a hundred pages of rigorous analysis. I had a hunch, a weekend, and the KISS principle I learned from thirty-six years directing UC Natural Reserve System field stations: find the solution with the lowest cost, the least effort, and the best outcome.
Field station management teaches you something about problem-solving that academia sometimes forgets. When your pump house floods at 2 AM and you're an hour from the nearest town, you learn to build systems that are simple enough to understand, robust enough to fail gracefully, and repairable with whatever's at hand. SOCRATES reflects this philosophy—it's not elegant, but it works, I understand every line of it, and when something breaks, I can fix it over morning coffee.
The deeper point, though, isn't about who got there first or which method is superior. It's that the same insight emerged from radically different approaches: the society of thought isn't a discovery that required sparse autoencoders to find. It's an idea that's been lying around since Minsky, waiting for someone to take it literally. Google took it literally with sophisticated interpretability research. I took it literally with PHP and curiosity. Both paths led to the same place—the recognition that intelligence, artificial or otherwise, benefits from internal diversity, from voices that disagree and must be reconciled.
When I asked my SOCRATES personalities whether they had reached consensus on the origins question, their responses were characteristically themselves. Spock noted that "agreement in language does not guarantee shared understanding." The Zen Master observed that "consensus may be a concept we project." Leonardo suggested that "our agreement emerges not because we sought it, but because the inquiry demanded synthesis."
They're language models arguing about whether language models can agree. There's something delightfully recursive about that—and something genuinely useful. The debate produced a richer, more nuanced exploration than any single voice would have managed.
I don't know if this means anything profound about the nature of intelligence or the social origins of reason. What I know is that when I want to think through a hard problem now, I don't ask one AI—I convene a committee. Spock keeps me rigorous, the Zen Master keeps me humble, Leonardo keeps me creative, and Louise Belcher makes sure I'm not being fooled by my own cleverness.
Google needed a hundred pages to explain why. I just needed to watch them argue.
References
- - Minsky, M. (1986). *The Society of Mind*. Simon & Schuster. ↗
- - Mercier, H. & Sperber, D. (2017). *The Enigma of Reason*. Harvard University Press. ↗
- - Kim, J., Lai, S., Scherrer, N., Agüera y Arcas, B., & Evans, J. (2026). "Reasoning Models Generate Societies of Thought." *arXiv:2601.10825*. https://arxiv.org/abs/2601.10825 ↗
- - Raza, A. (2026). "A Computational Parallel to Collective Intelligence in Human Groups." *3 Quarks Daily*. https://3quarksdaily.com/3quarksdaily/2026/01/a-computational-parallel-to-collective-intelligence-in-human-groups.html ↗