logoalt Hacker News

nirvdrumtoday at 3:04 PM0 repliesview on HN

For anyone else unfamiliar with the term:

RLHF = Reinforcement Learning from Human Feedback

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...