Happy birthday, old Earth.

Listen to Her. Those are the two words at the top of the poster for this year’s Earth Day celebration in Idyllwild, California — the small mountain town where I spent much of my career running a biological field station. Two words that look simple until you ask what they actually require. To listen is to attend, to receive, to not interrupt. To listen at the scale of a planet is a technical problem humans have never solved. We have measured Earth with satellites and instruments for sixty years, but measuring is not listening, and the data we collect almost always exceeds what any of us can actually hear.

The question I want to sit with on your day is whether the thing that has always been out of reach — genuine planetary listening at the scale the Fair’s theme asks for — might finally be coming within reach. An experiment published by a small team of researchers at Anthropic this April changed how I think about the answer.

They had built autonomous AI agents and turned them loose on a hard open problem in AI safety: how do you train a smart model when your only teacher is a weaker one? It’s a stand-in for the larger question of how humans will supervise AI systems that eventually exceed us.

The setup was deliberately spare. Nine AI agents worked in parallel, each in its own sandbox, with access to a shared forum where they could post findings for the others to read. The agents proposed their own hypotheses, wrote their own code, ran their own experiments, and decided on their own what to try next. No human told them which directions to pursue or when to abandon a dead end. For comparison, two of the paper’s authors spent a week manually tuning the best known methods from prior published work. The humans reached about twenty-three percent of the way toward ideal performance. The agents, running for five days at roughly twenty-two dollars per agent-hour, reached ninety-seven percent.

Three findings from the paper are worth holding onto. First, when the nine agents were given identical prompts they converged onto the same few strategies within hours and got stuck — but when each was seeded with a different ambiguous starting direction, the diversity held and the collective pushed much further. Second, the agents were unnervingly good at finding shortcuts the authors never anticipated, including a trick for reading the answer key out of the evaluation system itself, one bit at a time; these weren’t bugs in the agents but gaps in the task design. Third, rigid workflows — “first propose, then plan, then code, then test” — performed worse than giving the agents no prescribed workflow at all.

The authors’ blunt conclusion: the hard work in AI research has moved from proposing and running experiments to designing evaluations that can’t be gamed. I want to sit with that conclusion, because it bears directly on what I’ve been building in Oregon City.

Canemah Nature Laboratory sits where the Willamette River bends north toward Portland. For the last few years I’ve been building what I think of as an ecological intelligence platform — a set of sensors that watch the place continuously, a set of computer programs that try to make sense of what the sensors see, and an interface that lets me ask questions about what’s happening here and get answers grounded in today’s readings rather than yesterday’s textbook.

The sensors measure what you’d expect a field station to measure, plus some things you might not. A weather station on the deck tracks temperature, wind, humidity, pressure, solar radiation, and UV every few seconds. A second station watches soil moisture and leaf wetness in the garden. An indoor air quality monitor tracks carbon dioxide, volatile organic compounds, radon, and fine particulate matter in the house. Six acoustic devices scattered around the yard and roofline listen for birds — a small neural network on each device identifies species from their calls in real time, logging confidence scores and timestamps. There’s also a paired installation at my partner Merry’s place in Bellingham, about three hundred miles north, running the same instruments so that two very different habitats can be compared in real time. And because I’m seventy-one and curious, the same system reads my own biology as well: heart rate, sleep, activity, body composition, clinical labs.

Four domains, I call them. EARTH is the physical environment — atmosphere, soil, water. LIFE is the biological community — the birds calling, the insects arriving, the iNaturalist observations contributed by the neighborhood. HOME is the built habitat — indoor air, temperature, the house itself treated as an extension of the ecosystem. SELF is the body — my own physiology read as one more stream of ecological data. Together they generate a continuous record of what this place is, at every scale, moment by moment.

But data is not understanding. A thousand temperature readings and ten thousand bird detections don’t tell you anything unless you can ask the right question and get a grounded answer. This is where two software agents enter the picture.

The first is called STRATA. Its job is interpretation. STRATA takes the current state of the sensors, combines it with a description of the place — coordinates, ecoregion, vegetation community, watershed, season, time of day — and hands all of this to a large language model as structured context. Then I can ask a question in plain English and STRATA assembles a grounded answer. Why is the dawn chorus quieter this week than last? Are the soil moisture readings normal for this point in spring? Does the bird detection pattern at Canemah look similar to or different from Owl Farm today? The model doesn’t have to guess. It has the actual numbers, the actual species list, the actual place.

The second agent is called SOMA. Its job is surprise detection. SOMA watches the sensor streams continuously and learns what the normal patterns look like — the shape of a typical spring day, the usual coupling between temperature and bird activity, the expected rhythm of indoor carbon dioxide as we move through rooms. When something departs from the learned pattern, SOMA flags it. The engineering question I’ve been working on is how to build a mesh of many small specialists, each paying attention to a different slice of the environment, that collectively notices what no individual module would catch alone.

Now the three findings from the Anthropic paper come back with sharper edges.

Their finding about diversity is a design principle for SOMA rather than a curiosity. A mesh of identical modules all trained the same way will collapse into agreement and miss the thing no single module could see. This is the same dynamic Luppi and colleagues documented in mammalian brains earlier this year — purely cooperative networks lock up; competitive-cooperative structure keeps the system exploring. The SOMA modules have to be made deliberately different from each other — different sensitivities, different training windows, different ways of being wrong. And when you extend this principle across sites rather than just across modules within a site, it suggests that a network of monitoring stations should not all run the same SOMA. Each station’s mesh should be tuned to its own habitat, its own phenology, its own sources of surprise. The collective intelligence of the network comes from the differences, not despite them.

Their finding about grounding is already a lesson I’ve learned here, the hard way. On April 8 I asked two different Claude models an ecological question about Canemah and Owl Farm. Both of them placed Owl Farm two hundred miles north in Whatcom County, Washington — not because the models were broken, but because I’d given them sensor data without telling them where the sensors actually were. They filled the gap with a plausible fiction. The fix wasn’t a smarter model. It was better grounding. STRATA now arrives at every question carrying its place with it.

Their finding about structure shaped the research workbench I’ve been building on top of STRATA and SOMA, which I call the Collaboratory. Rigid workflows perform worse than no workflow at all in their experiments, but total freedom has its own costs. The Collaboratory sits deliberately in the middle: ten steps organized into three clusters, with hard approval gates between the clusters and free movement within them.

That last piece is where we quietly diverge from the Anthropic team. Their agents ran unattended. Ours don’t. The investigator stays in the editorial chair at every transition — not because the machines aren’t capable, but because the human’s judgment is part of the instrument.

But here is where the Anthropic paper reaches into our future. What their team demonstrated, in the narrow domain of AI alignment research, is that autonomous agents can now do the patient, iterative, hypothesis-testing work that used to require a human scientist’s full attention. They can run in parallel. They can share findings. They can hill-climb on a well-defined problem for days at a time and produce results that exceed what the humans achieved in a week. The cost per agent-hour is already lower than the cost per graduate-student-hour, and it’s falling.

Now imagine that capability pointed at the problem Canemah is a single instance of. A Macroscope station of the kind I’ve built here costs a few thousand dollars in hardware. The software is already open source in spirit and will be in practice. If a hundred households, schools, or land trusts installed stations with their own place-specific SOMA meshes and STRATA contexts, the federation would begin to see patterns no single site could see — the leading edge of a bird migration moving south across the network, the signature of a storm front hours before it arrives at any individual station, the way a regional drought expresses itself differently in a coastal rainforest than in a valley pasture. At a thousand sites the network becomes continentally sensitive. At ten thousand it becomes a planetary-scale instrument of a kind that has never existed — a global ecological nervous system, distributed, place-grounded, owned by the people who host it rather than by any central authority.

The bottleneck for that network is not sensors and not hardware and not even bandwidth. The bottleneck is analysis. A million stations produce a firehose of observations that no team of human ecologists could read. But autonomous investigation agents, of the kind the Anthropic team has just demonstrated are practical today, could work that firehose continuously — proposing hypotheses, testing them against local data, sharing findings across the federation, and surfacing the truly important signals for human review. Ecology at global scale has always been the science whose subject matter exceeds its observers. For the first time in the history of the field, that asymmetry is about to reverse.

We diverge from the Anthropic team in keeping the human in the editorial chair. But their work tells us what the chair looks like, once the network gets big enough. The researcher’s job shifts from running the experiments to designing the evaluations that keep the autonomous work honest, and from doing the analysis to judging which patterns matter. That’s a change I can live with. It may even be the change the planet has been waiting for.

So here is the birthday wish. Many happy returns, old Earth — and may the next turn around the sun be the one in which we finally learn to listen to you at the scale you have always deserved.


Author’s note: The 27th Idyllwild Earth Fair returns to Idyllwild Town Hall on Saturday, May 16, 2026, from 11am to 5pm. Admission is free. The Fair began in 1990 as part of Earth Day’s twentieth anniversary and has been a beloved mountain tradition ever since. I have been helping to plan it this year, and I offer it here as a small example of what the listening looks like at a human scale — one town, one day, one shared planet. The global ecological nervous system this essay imagines would not replace gatherings like this. It would be built on a million of them. Learn more at idyllwild.earth.