Information about the way we interact with the data (RLHF) can be used to refine agent behaviour. ...

lwhi • yesterday at 12:50 AM • 1 reply • view on HN

Information about the way we interact with the data (RLHF) can be used to refine agent behaviour.

While this isn't used specifically for LLM training, it can involve aggregating insights from customer behaviour.

Replies

That’s a training step. It requires explicitly collecting the data and using it in the training process.

Merely using an LLM for inference does not train it on the prompts and data, as many incorrectly assume. There is a surprising lack of understanding of this separation even on technical forums like HN.

alt Hacker News

Replies