logoalt Hacker News

popalchemistyesterday at 1:40 AM1 replyview on HN

Wrong, buddy.

Many of the top AI services use human feedback to continuously apply "reinforcement learning" after the initial deployment of a pre-trained model.

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...


Replies

Aurornisyesterday at 1:48 AM

RLHF is a training step.

Inference (what happens when you use an LLM as a customer) is separate from training.

Inference and training are separate processes. Using an LLM doesn’t train it. That’s not what RLHF means.

show 1 reply