Two documents arrived in my inbox within the past 24 hours, and between them they bracketed a question I have been circling for some time.

The first was a paper called “First Proof,” posted to arXiv on February 6 by a team of mathematicians from Stanford, Columbia, Harvard, Yale, Berkeley, and several other institutions. They had assembled ten research-level mathematics problems — drawn from their own unpublished work — and encrypted the answers. The challenge: let the best AI systems in the world take a crack at problems whose solutions have never appeared on the internet. No retrieval tricks, no pattern-matching against the training corpus. Just reasoning. Their preliminary tests with GPT 5.2 Pro and Gemini 3.0 Deepthink suggested that, given a single shot, current systems struggle with most of the questions.

The second was a Substack post by Melanie Mitchell, published yesterday, reflecting on the late philosopher Brian Cantwell Smith. Mitchell had just spoken at a symposium honoring Smith, who held the Reid Hoffman Chair in Artificial Intelligence and the Human at the University of Toronto before his death last year. Her reflection centered on a distinction Smith drew in his 2019 book The Promise of Artificial Intelligence — the distinction between reckoning and judgment.

Reckoning, in Smith’s framework, is calculative prowess: the capacity to manipulate symbols, solve equations, generate fluent text, predict protein structures. AI systems already excel at reckoning. Judgment is something else — a form of deliberative thought grounded in ethical commitment and responsible action, appropriate to the situation in which it is deployed. Smith’s fear, which Mitchell shares, was not that machines would become too intelligent but that we would rely on reckoning systems in situations that demand genuine judgment, and that by being dazzled by reckoning prowess, we would begin to expect human thinking to work the same way.

I read both pieces over coffee at six-thirty in the morning, which is when I do my best thinking. Then I checked my SOMA dashboard.

SOMA — Stochastic Observatory for Mesh Awareness — is a system I have been building at the Canemah Nature Laboratory as part of the Macroscope, my long-running project integrating sensors, AI, and ecological observation. SOMA uses Restricted Boltzmann Machines — a form of neural network rooted in statistical thermodynamics — to learn the normal patterns of my local environment. One mesh trains on weather data from a Tempest station: temperature, humidity, pressure, wind, solar radiation. Another trains on species detections from BirdWeather acoustic monitors. A third integrates across both, coupling the atmospheric and biological into a single learned model.

The system runs 273 inferences a day, every fifteen minutes, and reports what it finds as energy values and z-scores. When the pattern of inputs matches what the mesh has learned to expect, energy is low. When something is off — when the relationships between variables violate the mesh’s learned model — energy spikes, and the system flags an anomaly.

That morning, everything was quiet. All three meshes reading normal. But the anomaly log from the previous afternoon caught my eye: at 4:45 PM, the BirdWeather mesh had flagged something it labeled “unusual boldness.” A raptor had been detected, but the songbirds were still singing. The mesh had learned that when a Cooper’s Hawk shows up, the towhees and juncos go silent. Yesterday, they did not. The mesh did not know why this was interesting. It only knew that it was.

Here is where Smith’s distinction becomes useful.

The First Proof experiment tests reckoning at its absolute ceiling — the most demanding symbolic reasoning humans can produce, handed to AI in perfectly specified form. The questions arrive pre-registered, to use Smith’s term. Someone already did the profound cognitive work of finding these problems intelligible, of carving them out of the mathematical landscape as things worth asking. The models just have to compute.

SOMA operates at the opposite end. It receives the unregistered world — raw sensor flux, species presence and absence, temperature gradients — and learns to notice when something does not cohere. It cannot formulate a theorem. But it can register surprise, which is something the math models, for all their sophistication, are never asked to do.

Neither system does what Smith would call judgment. But they fail in interestingly different directions. The math models fail because the reasoning is too hard. SOMA does not fail at reasoning — Boltzmann machines are mathematically trivial compared to those ten problems. It succeeds at noticing and then stops, because it has no capacity to interpret what it has noticed.

Mitchell quoted Smith on this: “Taking the world to consist of discrete intelligible mesoscale objects is an achievement of intelligence, not a premise on top of which intelligence runs.” She also recalled a wonderful paper by Deb Raji and others that compared AI benchmarks to a Sesame Street book in which Grover visits a museum claiming to contain everything in the world. He tours every room, and then he comes to a door labeled “Everything Else.” He opens it and finds himself outside.

This is the door that SOMA cannot open. The mesh flags unusual boldness and then waits. It takes a field ecologist standing on the deck with thirty-six years of watching birds to interpret what the silence — or in this case, the absence of silence — might mean. Maybe the hawk is a juvenile and the songbirds have not yet learned to take it seriously. Maybe there is a second predator the microphones did not catch. Maybe the towhees are simply having a bold day. The hypotheses require judgment — commitment to the world, understanding of context, willingness to be wrong.

I do not expect AI to cross this gap. What I have found, after months of building SOMA and years of working with sensor networks before that, is something more practical and, I think, more honest. AI excels at accelerating the reckoning layer, which in turn multiplies the occasions for human judgment. Without SOMA, I get one anomaly signal: I happen to notice the towhees singing while a hawk is perched in the Douglas fir. With SOMA running continuously across three meshes, I get a stream of invitations to pay attention. The system triages. It says: this moment deserves your scarce expert attention more than that moment. And scarcity of expert attention — not scarcity of data, not scarcity of compute — is the actual bottleneck in ecological observation.

Some people argue that slowing down decision-making is always better. I think that leaves out too much. Ecology is not one high-stakes decision. It is ten thousand small noticings that accumulate into understanding. Accelerating the reckoning layer means more of those noticings happen before the season turns, before the migration window closes, before the phenological moment passes unrecorded.

I mentioned all of this to Merry, my partner in Bellingham, who asked the question she always asks when I get excited about something: “Why would you want to know that?” At first her question feels strange, because I am the sort of person who always wants to know why. But Merry’s question is actually more Smithian than she realizes. She is asking about commitment — what project in the world does this knowledge serve? What is at stake?

The thing is, Merry thinks nothing of picking up binoculars and scanning for birds without any particular reason. And since I installed three BirdWeather sensors at her house, she has become a relentless critic of the system’s confidence scores. She checks every detection she finds dubious, which is most of them. “That’s not a Varied Thrush” — and out come the binoculars for proof.

She is, without knowing it, running her own SOMA. Sensor detects, confidence score triggers skepticism, expert judgment deployed via optics, ground truth established. She is the human in the loop who makes the reckoning honest. She does not need a philosophical framework for this. She has birds.

Brian Cantwell Smith worried that we would mistake reckoning for judgment and build a world that could not tell the difference. I share that worry. But sitting here in the pre-dawn of this morning — the mesh humming, the River roaring over the falls in the background, the anomaly log from yesterday still glowing on the screen — I think the more interesting question is not whether AI will ever achieve judgment, but whether we can build systems that create more space for the judgment we already have.

The door marked “Everything Else” opens onto the actual world. We are the ones who have to walk through it. But there is nothing wrong with a system that tells you which door to try.