We keep building tools that automate decisions. We haven’t figured out which decisions shouldn’t be automated.

The easy answer is “high-stakes decisions” — medical, legal, financial. But I think that framing is too coarse. The judgment problem isn’t about stakes. It’s about legibility.

A decision can be automated well when its success criteria are clear, measurable, and don’t shift depending on who’s affected. Spam filtering is automatable. Hiring is not — not because the stakes are higher (they’re comparable), but because what counts as a good hire changes based on team dynamics, role evolution, and context that the evaluator has but the algorithm doesn’t.

The tools I find most interesting are the ones that make this distinction explicit: they automate the parts of a task where legibility is high and return control to the human where it isn’t. They don’t try to do everything. They know where they stop.

Most AI products don’t do this. They present a completion and wait to see if the human pushes back. That’s not judgment support — it’s judgment outsourcing with plausible deniability.

What would it look like to build a tool that actively flagged the moments when you should override it? When it surfaced its uncertainty rather than hiding it in a confident-sounding output?

That’s the design challenge I keep coming back to.