Something being possible doesn't mean it's easy. Transforming a problem from a forbidden s...

Retr0id • today at 12:00 PM • 2 replies • view on HN

Something being possible doesn't mean it's easy. Transforming a problem from a forbidden shape into an allowed shape could well be harder than just solving the original problem.

Replies

roenxi • today at 1:23 PM

I think the article just proved that aggressive exploitation is equivalent to normal bugfixing, so it seems like there are some large and important classes of transform that are easy.

It took me a minute of thinking to understand how this could even be considered a jailbreak; if Anthropic are going to turn out models that can't handle "find and develop regression test scripts for bugs in this program" as a prompt then it is going to take serious model crippling. To be able to prompt the model someone will need to already understand secure programming - the model itself won't be able to independently detect security problems without active guidance.

➕ show 1 reply

OutOfHere • today at 2:41 PM

It could be easier when you use a less smart uncensored model to control the smarter but censored one.

alt Hacker News

Replies