Note that model sycophancy is caused by RLHF. In other words: Imagine taking a human in his formativ...

khafra • yesterday at 7:13 AM • 0 replies • view on HN

Note that model sycophancy is caused by RLHF. In other words: Imagine taking a human in his formative years, and spending several subjective years rewarding him for sycophantic behavior and punishing him for candid, well-calibrated responses.

Now, convince him not to be sycophantic. You have up to a few thousand words of verbal reassurance to do this with, and you cannot reward or punish him directly. Good luck.

alt Hacker News