logoalt Hacker News

danpalmertoday at 1:47 AM1 replyview on HN

What a joke. Must make it pretty easy to poison a session, you don't need to persuade the model about anything, just trigger its security controls, ideally after as much context as possible, but before it has generated any useful output.


Replies

kay_otoday at 1:55 AM

After all, what is roleplay or games but a jailbreak of guard rails? :]

I've even had it refuse CTFs knowing it is a CTF with blatantly obvious CTF flag, no actual application