It doesn’t appear that anyone at OpenAI sat down and thought “let’s make our model more sycophantic ...

Legend2440 • yesterday at 9:28 PM • 2 replies • view on HN

It doesn’t appear that anyone at OpenAI sat down and thought “let’s make our model more sycophantic so that people engage with it more”.

Instead it emerged automatically from RLHF, because users rated agreeable responses more highly.

Replies

astrange • yesterday at 10:24 PM

Not precisely RLHF, probably a policy model trained on user responses.

RL works on responses from the model you're training, which is not the one you have in production. It can't directly use responses from previous models.

tsunamifury • yesterday at 11:52 PM

I can tell you’ve never worked in big tech before.

Dark patterns are often “discovered” and very consciously not shut off because the reverse cost would be too high to stomach. Esp in a delicate growth situation.

See Facebook at its adverse mental health studies

alt Hacker News

Replies