Unless I'm missing something, this argument seems to apply only to the original pretraining era (eg GPT 1-4). The post-training and reinforcement learning paradigms are clearly doing variation, evaluation and selective retention no?
The transcript does seem to overlook post-training steps like Reinforcement Learning with Verifiable Rewards (RLVR) (but I'll certainly won't claim that Rich Sutton is unaware of such things; RLVR has a very narrow set of evaluation approaches).
I wonder if this is a precursor to Keen Tech leaning into David Silver's Ineffable Intelligence approach.
The transcript does seem to overlook post-training steps like Reinforcement Learning with Verifiable Rewards (RLVR) (but I'll certainly won't claim that Rich Sutton is unaware of such things; RLVR has a very narrow set of evaluation approaches).
I wonder if this is a precursor to Keen Tech leaning into David Silver's Ineffable Intelligence approach.