It's cheaper to distill than to do reinforcement learning, so of course they prefer that, but i...

logicchains • yesterday at 7:34 PM • 0 replies • view on HN

It's cheaper to distill than to do reinforcement learning, so of course they prefer that, but if it wasn't an option they could just pay up and spend more GPU time on RL.

alt Hacker News