Please don't. All of this "security" and "safety" theater is completel...

kouteiheika • yesterday at 6:57 PM • 4 replies • view on HN

Please don't.

All of this "security" and "safety" theater is completely pointless for open-weight models, because if you have the weights the model can be fairly trivially unaligned and the guardrails removed anyway. You're just going to unnecessarily lobotomize the model.

Here's some reading about a fairly recent technique to simultaneously remove the guardrails/censorship and delobotomize the model (it apparently gets smarter once you uncensor it): https://huggingface.co/blog/grimjim/norm-preserving-biprojec...

Replies

avadodin • yesterday at 10:47 PM

I already knew of this technique but it is so beautiful. It is likely that we have similar thought-suppressing structures in our brains.

ronsor • yesterday at 8:11 PM

"It rather involved being on the other side of this airtight hatchway."

https://devblogs.microsoft.com/oldnewthing/20060508-22/?p=31...

nottorp • yesterday at 8:49 PM

> it apparently gets smarter once you uncensor it

Interesting, that has always been my intuition.

alt Hacker News

Replies