logoalt Hacker News

coppsilgoldtoday at 1:28 AM0 repliesview on HN

This is more or less what happens. These models are tuned with reinforcement learning from human feedback (RLHF). Humans give them feedback that this type of language is good.

The notorious "it's not X, it's Y" pattern is somewhat rare from actual humans, but it's catnip for the humans providing the feedback.