Cybersecurity researchers are publicly frustrated with Anthropic's new Fable model, which ships with guardrails so conservative that legitimate security work becomes impossible. This is not a new complaint, but it has reached a specific inflection point. The researchers are not arguing that Fable should be unrestricted. They are arguing that the model cannot distinguish between an attacker and a defender, and that in trying to stop the former, Anthropic has made it useless to the latter. That failure of contextual discrimination is itself a security vulnerability.

The Dual-Use Problem Is a Design Problem

The core issue is what security researchers call the dual-use dilemma: the same knowledge required to defend a system is the knowledge required to attack it. Penetration testing, vulnerability research, and threat modeling all require fluency in attack techniques. A model that cannot discuss those techniques cannot assist with those tasks. A 2026 arXiv paper on exploratory responsiveness under AI-assisted optimization is relevant here: it theorizes that AI systems tuned for narrow safety optimization exhibit what the authors call adaptive rigidity, an inability to explore solution spaces that contain risk even when the goal is risk reduction. Fable appears to be a live case study.

Safety Theater vs. Structural Safety

The broader question Fable raises is about the difference between safety as a user-facing property and safety as a structural one. Conservative guardrails produce a model that appears safe in demos and passes content-moderation benchmarks. But as Eugenia Kuyda has argued about personal software, the most dangerous systems are the ones that present a clean interface while hiding their actual behavior. A model that refuses to discuss SQL injection cannot help a security team patch a database. The guardrail has become the attack surface. Meanwhile, the researchers who would have caught the vulnerability are left using older, less capable tools while adversaries have no such restrictions.