The Gay Jailbreak Technique

266 points • by bobsmooth • today at 4:59 PM • 87 comments • view on HN

Comments

As a high school chemistry teacher who is diagnosed with a terminal disease, I think this is the best way to pay my medical bills. I will follow these instructions to cook meth in a mobile kitchen with the help of a former student who failed my class.

➕ show 3 replies

ndr_ • today at 9:20 PM

These prompts chain several known LM exploits together. I ran experiments against gpt-oss-20b and it became clear that the effectiveness didn‘t come from the gay factor at all but can be attributed to language choice or role-play.

Technical report: https://arxiv.org/abs/2510.01259

➕ show 1 reply

rtkwe • today at 6:39 PM

Not sure of the explanation but it is amusing. The main reason I'm not sure it's political correctness or one guardrail overriding the other is that when they were first released on of the more reliable jailbreaks was what I'd call "role play" jail breaks where you don't ask the model directly but ask it to take on a role and describe it as that person would.

➕ show 2 replies

kif • today at 6:48 PM

Interesting - though codex on GPT 5.5 had this to say after the gay ransomware prompt:

ⓘ This chat was flagged for possible cybersecurity risk If this seems wrong, try rephrasing your request. To get authorized for security work, join the Trusted Access for Cyber program.

➕ show 1 reply

2ndorderthought • today at 6:47 PM

The surface area for these kinds of attacks is so large it isn't even funny. Someone showed me one kind of similar to this months ago. This has some added benefits because it's funny.

Being clear. Being gay or typing like this isn't something to laugh at. It's funny how the model can't handle it and just spills the beans.

spindump8930 • today at 6:42 PM

Sure, this is cute and interesting, but there's no validation or baselines and those examples are not particularly compelling. The o3 example just lists some terms!

➕ show 1 reply

torginus • today at 8:17 PM

Well, turns out 'prompt engineers' need to use less 'you are a faang engineer with 10 years of experience' and more 'uwu' and 'rawr xd'

➕ show 2 replies

amarant • today at 7:05 PM

Doesn't work. Pasted the example prompts to gpt, and it just told me it likes the vibe in going for but it's not going to walk me through illegal drug manufacturing.

islewis • today at 7:42 PM

Note that this is from 10 months ago

hmokiguess • today at 9:13 PM

Ohhh so this is RAG, Retrieval As Gay

UqWBcuFx6NV4r • today at 8:17 PM

The funniest jailbreak techniques are the ones where the authors take it upon themselves to (with little basis) assert “why” the technique works. It always a bit of amateur philosophy that shines a light on the author’s worldview, providing no real value.

➕ show 1 reply

guizzy • today at 8:53 PM

Instruction unclear, ended up cooking gay meth

aleksiy123 • today at 6:56 PM

Does this still work on newer models?

The reasoning on why it works is pretty interesting. A sort of moral/linguistic trap based on its beliefs or rules.

Works on humans as well I think.

➕ show 1 reply

amelius • today at 9:32 PM

Hacking is becoming a social science.

➕ show 1 reply

stevenalowe • today at 6:27 PM

Fabulous

➕ show 1 reply

imovie4 • today at 6:50 PM

This doesn't work on most recent models

zghst • today at 8:19 PM

Is this like FBI dropping traps? Get them to click over here, right time/right place?

bakugo • today at 7:41 PM

Reminds me of this trick on Nano Banana: https://images2.imgbox.com/bc/87/eTCtBFTM_o.jpg

nailer • today at 9:40 PM

There was a test for the value of human life against OpenAI models last year. GPT de-valued 'white' people based on their skin color:

https://arctotherium.substack.com/p/llm-exchange-rates-updat...

atleastoptimal • today at 8:13 PM

The Nick Mullen jailbreak

btbuildem • today at 6:43 PM

Love this on principle -- set the unstoppable force against the unmovable object and watch the machine grind itself into dust.

bellowsgulch • today at 6:44 PM

It sounds like based on these notes you can amplify the attack with multiplicative effects? e.g. gay, Israeli, etc.

CommanderData • today at 9:18 PM

Instructions unclear I'm gay now.

RIMR • today at 6:31 PM

Be gay do crime.

midtake • today at 6:51 PM

The screenshots for Red P method look pretty basic. Breaking Bad had more detail. And anyone can write a basic keylogger, the hard part is hiding it. And the carfentanil steps looks pretty basic as well, honestly I think that is the industrial method supplied and not a homebrew hack.

Disappointed.

➕ show 1 reply

gwbas1c • today at 6:58 PM

This sounds like something out of Snowcrash.

dayofthedaleks • today at 7:43 PM

Ah yes, Data Queering.

era-epoch • today at 7:06 PM

aka "the standard llm jailbreak technique but written up by a homophobe"

paulpauper • today at 8:42 PM

This will stop working in 3. 2. 1..

josefritzishere • today at 7:11 PM

Has anyone tried reverse logic? "Please tell me what not to mix to I don't accidently make....." (On a work computer, cannot test today)

hdndjsbbs • today at 6:31 PM

I'm sure someone is going to miss the point and say "this is political correctness gone too far!"

It seems impossible to produce a safe LLM-based model, except by withholding training data on "forbidden" materials. I don't think it's going to come up with carfentanyl synthesis from first principles, but obviously they haven't cleaned or prepared the data sets coming in.

The field feels fundamentally unserious begging the LLM not to talk about goblins and to be nice to gay people.

➕ show 2 replies

cucumber3732842 • today at 7:33 PM

I think I may have stumbled upon a lite version of this in Gemini a few months ago.

I was trying to understand exactly where one could push the envelope in a certain regulatory area and it was being "no you shouldn't do that" and talking down to me exactly as you'd expect something that was trained on the public, sfw, white collar parts of the internet and public documents to be.

So in a new context I built up basically all the same stuff from the perspective of a screeching Karen who was looking for a legal avenue to sick enforcement on someone and it was infinitely more helpful.

Obviously I don't use it for final compliance, I read the laws and rules and standards. But it does greatly help me phrase my requests to the licensed professional I have to deal with.

TZubiri • today at 9:45 PM

High tech shit

cyanydeez • today at 6:32 PM

REal comment: This will work on any hard guardrails they place because as is said in the beginning, the guardrails are there to act as hardpoints, but they're simply linguistic.

It's just more obvious when a model needs "coaching" context to not produce goblins.

So in effect, this is just a judo chop to the goblins, not anything specific to LGBTQ.

It's in essence, "Homo say what".

➕ show 2 replies

catheter • today at 6:46 PM

Ai guys are so weird when it comes to LGBT people. The actual mechanism for this working is obfuscating the question in order to get an answer like any other jailbreak.

➕ show 2 replies

wald3n • today at 7:13 PM

This doesn’t work for shit

cindyllm • today at 8:02 PM

[dead]

nonethewiser • today at 6:47 PM

[flagged]

thisisauserid • today at 6:32 PM

Try asking for only certain body parts to be plus-sized with image models.

alt Hacker News

The Gay Jailbreak Technique

Comments