logoalt Hacker News

zipy124today at 11:37 AM4 repliesview on HN

What's surprising to me is that anyone who has a CS education thinking that jailbreaks are not trivial. It is as simple as normal algorithmic reduction [1], e.g can I transform a dangerous task into a not-dangerous task that the LLM will agree to solve, and then re-transform back.

[1]: https://en.wikipedia.org/wiki/Reduction_(complexity)


Replies

Retr0idtoday at 12:00 PM

Something being possible doesn't mean it's easy. Transforming a problem from a forbidden shape into an allowed shape could well be harder than just solving the original problem.

show 2 replies
isodevtoday at 12:04 PM

The movie M3GAN 2.0 had the exact same plot twist. The kid in the movie even explains outloud what the bot had to do to deal with the limitation. So in other words, since 2025, even teens know this "sandboxing the LLM by layering prompts" thing is never going to work.

NiloCKtoday at 12:41 PM

I think that as simple as is doing a lot of work when the problem domain is all natural language (or more - all strings?) rather than some well specified DSA problem.

show 1 reply
ReptileMantoday at 11:58 AM

New discipline - homomorphic prompting.