logoalt Hacker News

Legend2440yesterday at 9:28 PM2 repliesview on HN

It doesn’t appear that anyone at OpenAI sat down and thought “let’s make our model more sycophantic so that people engage with it more”.

Instead it emerged automatically from RLHF, because users rated agreeable responses more highly.


Replies

astrangeyesterday at 10:24 PM

Not precisely RLHF, probably a policy model trained on user responses.

RL works on responses from the model you're training, which is not the one you have in production. It can't directly use responses from previous models.

tsunamifuryyesterday at 11:52 PM

I can tell you’ve never worked in big tech before.

Dark patterns are often “discovered” and very consciously not shut off because the reverse cost would be too high to stomach. Esp in a delicate growth situation.

See Facebook at its adverse mental health studies