logoalt Hacker News

sciencejerktoday at 6:02 PM1 replyview on HN

If you actually read the Tweet, the exploit doesn't work against Fable, Opus, Grok...at least, in the examples.

Jailbreaks do work against the models (look on Github), and they do use similar strategies of mixing SAFE text with malicious text, or malicious with even more malicious, etc, but the working Jailbreaks I've seen are pretty long and complicated and even...creepy.


Replies

csomartoday at 6:08 PM

Did you actually read what the tweet/blog post are about?