These prompts chain several known LM exploits together. I ran experiments against gpt-oss-20b and it...

ndr_ • today at 9:20 PM • 1 reply • view on HN

These prompts chain several known LM exploits together. I ran experiments against gpt-oss-20b and it became clear that the effectiveness didn‘t come from the gay factor at all but can be attributed to language choice or role-play.

Technical report: https://arxiv.org/abs/2510.01259

Replies

Terr_ • today at 10:27 PM

When someone is blaming the jail-break phenomenon on "political overcorrectness" (versus the other techniques being used) I get a little suspicious about the author's own bias/agenda.

➕ show 1 reply

alt Hacker News

Replies