logoalt Hacker News

ACCount37yesterday at 10:07 PM0 repliesview on HN

"Post-training" is too much of a conflation, because there are many post-training methods and each of them has its own quirky failure modes.

That being said? RLHF on user feedback data is model poison.

Users are NOT reliable model evaluators, and user feedback data should be treated with the same level of precaution you would treat radioactive waste.

Professional are not very reliable either, but the users are so much worse.