logoalt Hacker News

int_19h01/20/20250 repliesview on HN

This is incorrect - while it's true that most cloud providers have a filtering pass on both inputs and outputs these days, the model itself is also censored via RLHF, which can be observed when running locally.

That said, for open-weights models, this is largely irrelevant because you can always "uncensor" it simply by starting to write its response for it such that it agrees to fulfill your request (e.g. in text-generation-webui, you can specify the prefix for response, and it will automatically insert those tokens before spinning up the LLM). I've yet to see any locally available model that is not susceptible to this simple workaround. E.g. with QwQ-32, just having it start the response with "Yes sir!" is usually sufficient.