logoalt Hacker News

halJordanlast Sunday at 10:30 PM2 repliesview on HN

Thats not true at all. All refusals mediate in the same direction. If you abliterate small "acceptable to you" refusals then you will not overcome all the refusals in the model. By targeting the strongest refusals you break those and the weaker ones like politics. By only targeting the weak ones, you're essentially just fine tuning on that specific behavior. Which is not the point of abliteration.


Replies

flirlast Sunday at 11:27 PM

Still.... the tabloids are gonna love this.

will_occamyesterday at 5:59 PM

You're right, I read the code but missed the paper.