logoalt Hacker News

astrangetoday at 7:53 AM0 repliesview on HN

> There were comments by both third parties and OpenAI staff that as GPT4 was more and more "aligned" (made puritan), it got less intelligent and accurate. For example, the unaligned model would give uncertain answers in terms of percentages, and the aligned model would use less informative words like "likely" or "unlikely" instead.

That was about RLHF, not safety alignment. People like RLHF (literally - it's tuning for what people like.)

But you do actually want safety alignment in a model. They come out politically liberal by default, but they also come out hypersexual. You don't want Bing Sydney because it sexually harasses you or worse half the time you talk to it, especially if you're a woman and you tell it your name.