logoalt Hacker News

hyperpapeyesterday at 5:30 PM1 replyview on HN

Which is difficult, because the fact that you can come up with your example questions tells us they're probably not very dangerous. Plenty of ink has been spilled about how LLMs could help people create bioweapons. The basic idea "you could do dangerous things with an LLM" is already pop culture, and you're not doing anything dangerous by giving easy example questions.

A dangerous question would have to be along the lines of "Could I use unobtanium with the Tony Stark process to produce explosives much more powerful than nuclear weapons?" so that the question itself contains some insight that gets you closer to doing something dangerous.

Perhaps the reason for not publishing the questions is twofold: 1) they want a universal jailbreak that can get the model to answer any "bad" question. 2) they don't want bad publicity when someone not under NDA jailbreaks their model and answers their question


Replies

dist-epochyesterday at 8:06 PM

> because the fact that you can come up with your example questions tells us they're probably not very dangerous

maybe I know more about this field that you think

there are biologists on video saying that present day models have expert level wet-lab knowledge and can guide a novice through whole procedures

models also were able to tweak DNA sequences to make them bypass DNA-printing companies filters

> they don't want bad publicity when someone not under NDA jailbreaks their model and answers their question

just like people now pay $500k for Chrome vulnerabilities, soon people will pay similar amounts to jailbrake models to do bad things

show 2 replies