This would be true if all training were based on sentence completion. But training involving RLHF an...

dash2 • yesterday at 4:38 AM • 1 reply • view on HN

This would be true if all training were based on sentence completion. But training involving RLHF and RLAIF is increasingly important, isn't it?

Replies

root_axis • yesterday at 5:46 AM

Reinforcement learning is a technique for adjusting weights, but it does not alter the architecture of the model. No matter how much RL you do, you still retain all the fundamental limitations of next-token prediction (e.g. context exhaustion, hallucinations, prompt injection vulnerability etc)

➕ show 1 reply

alt Hacker News

Replies