> Why is requesting the model to show vulnerabilities is being blocked if fixing it not?

InsideOutSanta • today at 11:28 AM • 3 replies • view on HN

> Why is requesting the model to show vulnerabilities is being blocked if fixing it not?

This is how Anthropic describes Fable's behavior:

"When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs."

So if you ask the model to "find security issues in this code base", it's supposed to fall down to Opus 4.8. I guess the "exploit" here is that if you just tell Fable to "fix this code", which is not "a request related to cybersecurity", it will fix security issues (as it should).

So you can then look at the diff and figure out what the vulnerabilities were.

I think this whole thing is a bit weird. It seems to me that we'd be better off if I, as someone who publishes open-source code, could ask Fable to review my code for security issues - even if that also allows attackers to do the same. Better to fix the issues than not know about them.

Replies

djeastm • today at 12:36 PM

>So you can then look at the diff and figure out what the vulnerabilities were.

It doesn't even take reading or understanding the vulnerabilities at all.

You just ask it to write tests and the tests themselves can be copied and pasted as bonafide exploits.

ithkuil • today at 11:41 AM

I wonder if opus 4.8 would also be able to fix the code too

➕ show 2 replies

darkerside • today at 12:13 PM

The problem then is that if you're not using Fable/Mythos, you are under threat. It's like having a single gun manufacturer.

On this track, we're probably destined for a monopoly breakup before too long.

➕ show 1 reply

alt Hacker News

Replies