logoalt Hacker News

jakkostoday at 6:17 PM1 replyview on HN

Any time I've tried an "abliterated" model, heretic or other, it has always damaged the capabilities of the original model and will still often refuse or produce garbage at a lot of "unsafe" requests.


Replies

thot_experimenttoday at 6:59 PM

Abliteration can't teach the model something that wasn't in pre-training, it's just fixing refusals from post-training. I don't find the delta to be that big in practice and it really depends on what you're doing with the models anyway. If your primary usecase is sexy roleplay I think the loss of absolute capability is probably worth the abliteration, for malware research it's probably better to just jailbreak.

I've mostly found that finetunes and abliterations are of limited use but that's recently changed for me. My default model for the past week or so has been a Qwen 3.6 tuned on Opus 4.7, it's definitely a bit worse than the base Qwen in terms of precision and "intelligence", but it MORE than makes up for it in response style. Way easier to get it to write things that I want to read, it's way more terse, way fewer emoji. Best local rubber duck by far.

show 1 reply