Unless I'm missing something, this argument seems to apply only to the original pretraining era...

highfrequency • today at 3:48 AM • 1 reply • view on HN

Unless I'm missing something, this argument seems to apply only to the original pretraining era (eg GPT 1-4). The post-training and reinforcement learning paradigms are clearly doing variation, evaluation and selective retention no?

Replies

kibibu • today at 4:25 AM

The transcript does seem to overlook post-training steps like Reinforcement Learning with Verifiable Rewards (RLVR) (but I'll certainly won't claim that Rich Sutton is unaware of such things; RLVR has a very narrow set of evaluation approaches).

I wonder if this is a precursor to Keen Tech leaning into David Silver's Ineffable Intelligence approach.

➕ show 1 reply

alt Hacker News

Replies