Wrong, buddy. Many of the top AI services use human feedback to continuously apply "reinforce...

popalchemist • yesterday at 1:40 AM • 1 reply • view on HN

Wrong, buddy.

Many of the top AI services use human feedback to continuously apply "reinforcement learning" after the initial deployment of a pre-trained model.

Aurornis • yesterday at 1:48 AM

RLHF is a training step.

Inference (what happens when you use an LLM as a customer) is separate from training.

Inference and training are separate processes. Using an LLM doesn’t train it. That’s not what RLHF means.

➕ show 1 reply

alt Hacker News