What's surprising to me is that anyone who has a CS education thinking that jailbreaks are not trivial. It is as simple as normal algorithmic reduction [1], e.g can I transform a dangerous task into a not-dangerous task that the LLM will agree to solve, and then re-transform back.
The movie M3GAN 2.0 had the exact same plot twist. The kid in the movie even explains outloud what the bot had to do to deal with the limitation. So in other words, since 2025, even teens know this "sandboxing the LLM by layering prompts" thing is never going to work.
I think that as simple as is doing a lot of work when the problem domain is all natural language (or more - all strings?) rather than some well specified DSA problem.
New discipline - homomorphic prompting.
Something being possible doesn't mean it's easy. Transforming a problem from a forbidden shape into an allowed shape could well be harder than just solving the original problem.