This kind of approach would generally still need human guidance, otherwise these models might get st...

wwind123 • today at 6:37 AM • 1 reply • view on HN

This kind of approach would generally still need human guidance, otherwise these models might get stuck in weird niche corners of the problem space that would not be relevant to any real world project.

Replies

ben_w • today at 7:19 AM

We could call this "reinforcement learning from human feedback" (RLHF) :)

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...

alt Hacker News

Replies