Wrong, buddy.
Many of the top AI services use human feedback to continuously apply "reinforcement learning" after the initial deployment of a pre-trained model.
https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...
RLHF is a training step.
Inference (what happens when you use an LLM as a customer) is separate from training.
Inference and training are separate processes. Using an LLM doesn’t train it. That’s not what RLHF means.
RLHF is a training step.
Inference (what happens when you use an LLM as a customer) is separate from training.
Inference and training are separate processes. Using an LLM doesn’t train it. That’s not what RLHF means.