logoalt Hacker News

Aurornislast Monday at 1:48 AM1 replyview on HN

RLHF is a training step.

Inference (what happens when you use an LLM as a customer) is separate from training.

Inference and training are separate processes. Using an LLM doesn’t train it. That’s not what RLHF means.


Replies

popalchemistlast Monday at 2:18 AM

I am aware, I've trained my own models. You're being obtuse.

The big companies - take Midjourney, or OpenAI, for example - take the feedback that is generated by users, and then apply it as part of the RLHF pass on the next model release, which happens every few months. That's why they have the terms in their TOS that allow them to do that.