logoalt Hacker News

InsideOutSantayesterday at 6:40 PM1 replyview on HN

Yes, I have read their own claims. Here's the relevant part:

"When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs."

Asking Fable to fix bugs in a code base is not "a request related to cybersecurity." When Fable was asked to fix bugs and then proceeded to fix bugs, that was not "removing guardrails". Fable did exactly what it should have done. Claiming otherwise makes absolutely no sense at all.


Replies

0xyyesterday at 7:27 PM

Fable specifically refused to harden the security of codebases. If you use misdirection to force Fable to do just that, that's the removal of a guardrail.

Anthropic specifically stated that ANY security requests should be shunted to Opus 4.8. This was bypassable.

I don't see what your confusion here is. Fable was prevented from working on any security tasks. A significant amount of people, myself included, witnessed Fable refusing to harden code as a result. Bypassing that is a bypass of guardrails.

Your assertion that working on security is not working on security because you used misdirection is of course, preposterous.

You wouldn't be making the same claim if Fable refused to work on chemical weapons research but happily proceeded to do so if you claimed it was for eradicating pests.