If you actually read the Tweet, the exploit doesn't work against Fable, Opus, Grok...at least, ...

sciencejerk • today at 6:02 PM • 1 reply • view on HN

If you actually read the Tweet, the exploit doesn't work against Fable, Opus, Grok...at least, in the examples.

Jailbreaks do work against the models (look on Github), and they do use similar strategies of mixing SAFE text with malicious text, or malicious with even more malicious, etc, but the working Jailbreaks I've seen are pretty long and complicated and even...creepy.

Replies

csomar • today at 6:08 PM

Did you actually read what the tweet/blog post are about?

alt Hacker News

Replies