AI Has a Trust Problem, Not a Capability Problem

This week handed us three different articulations of the same underlying anxiety. Waymo issued a recall because its robotaxis were driving through flooded roads. Fast Company published a piece arguing AI failure in enterprises is a leadership problem, not a technology problem. And a new arXiv paper titled 'Where Reliability Lives in Vision-Language Models' mapped the specific neural circuits where VLMs become untrustworthy. Three layers. Three scales. One crisis.

The Mechanistic Reality of AI Unreliability

The arXiv paper, by Mann et al. (2026), challenges a core intuition: that attention on relevant image regions makes vision-language models more trustworthy. Their mechanistic study of attention heads and causal circuits found that reliability is encoded deeper, in hidden state geometry, not surface-level attention patterns. This has enormous practical implications. It means that the proxies engineers use to trust AI outputs, looking at what the model is "looking at," are insufficient. Waymo's flooding problem is a physical manifestation of the same principle. The software was not attending to the wrong thing. Its world model lacked a robust representation of water depth as a threat category. The gap between capability and reliability is a circuit-level problem, not a prompt-level one.

Why Leadership Cannot Fix What Architecture Broke

Fast Company's argument that AI failure is a leadership problem is correct but incomplete. Leaders cannot reorganize their way out of a fundamentally unreliable system. What they can do is stop treating AI deployment as a technology decision and start treating it as a trust architecture decision. The Waymo recall is instructive: they issued a software patch while working on a "final remedy." That gap, between a patch and a remedy, is exactly where enterprise AI lives right now. TurboFund's Signal Report tracks Sam Altman flagging a qualitative leap in ChatGPT reliability this week, which reads differently against this backdrop. A leap in capability without a corresponding leap in mechanistic reliability is just a faster way to flood the road. NeurIPS is even being urged in a new preprint by Vishwarupe et al. to require reproducibility standards for frontier AI safety claims. The institutions are catching up to the circuits. Slowly.