logoalt Hacker News

0xytoday at 5:26 PM1 replyview on HN

What? I personally experienced Fable outright refuse to do ANY security-related tasks, including hardening code or modifying security-related features. That was a guardrail. It was bypassed.

Anthropic themselves specifically called them safeguards. [1]

"When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead"

This is exactly what was bypassed. They got Fable to work on security topics.

[1] https://www.anthropic.com/news/claude-fable-5-mythos-5


Replies

InsideOutSantatoday at 6:11 PM

They got Fable to fix bugs, including security issues, which is what it is supposed to do.

show 1 reply