alt
Hacker News
at2005
•
today at 6:50 AM
•
0 replies
•
view on HN
Ah, I meant that MCTS uses more inference-time compute (over GRPO) to
produce
a training sample