Epistemic Culture

Status: early draft, adapted from the 2021–22 estimation-theory notes.

The claim

Many of the real limits on evaluation systems are cultural, not technical. No matter how good the tooling, a system has no impact if the community around it is too uncomfortable to use it. Culture is part of the background an evaluation system runs on — not a component you can build, but a precondition you have to cultivate (see The Four Components).

This is a strong claim, and it might be wrong (it’s listed as a crux). But if it’s right, it reorders priorities: the highest-leverage work isn’t a better forecasting algorithm, it’s making candid, public, quantified judgment socially survivable.

The candidness problem

The sharpest version is what you might call the candidness problem: the moment an evaluation starts to matter, the incentive to be honest in it collapses.

A worked example from the original notes. Certificates of impact require estimating the value of many charitable interventions. But if an organization knows funders are watching the value of its certificates closely, it becomes wary of issuing certificates for anything but its very best work — because a mediocre rating is now a liability. The act of measuring distorts the thing measured, via the feelings and incentives of the measured.

This generalizes: any evaluation system pointed at people or organizations whose reputations are at stake will face pushback, strategic non-participation, and pressure to soften or suppress unflattering outputs.

The rollout problem

Now scale it up. Imagine an agency that, starting tomorrow, published “pretty good” estimates of the impact of every politician, bill, organization, and individual. Even granting the estimates were sound:

The disruption would be enormous and the pushback fierce.
The agency would be a magnet for libel suits and political pressure.
It would likely be shut down or captured before it stabilized.

So the problem isn’t only “can we produce the evaluations” — it’s “can we deploy them without the system being destroyed on contact.” That is a sequencing and rollout problem: which evaluations to publish first, how transparent to be how fast, how to balance information against the comfort of the evaluated. The notes suggest starting with custom, lower-stakes systems and taking carefully chosen steps toward transparency, rather than flipping on full public ratings at once.

Cultural engineering

If culture is the bottleneck, it’s also a design surface. Some directions:

Norms of candidness and truth-seeking — communities where honest negative assessments are expected and tolerated, not punished.
Graduated transparency — rolling out from private to semi-public to public as trust and norms develop, rather than all at once.
Comfort-aware design — explicitly trading some information value for reduced offense early on, to keep a system alive long enough to mature.
Small-group experimentation — testing cultural interventions in small, willing communities before deploying them in higher-stakes settings.

Why this is tractable (maybe)

The optimistic case for working on culture: other constraints (talent, funding, institutional buy-in) are often less mutable than culture, and many of the cheapest wins are specifically cultural. The pessimistic case: culture is famously hard to change deliberately, and “just make people more candid” has defeated many reformers.

Either way, an evaluation system that ignores the cultural environment is designing for a world that doesn’t exist. The techniques page treats cultural change as one of the field’s core techniques for exactly this reason.