logoalt Hacker News

at2005today at 6:50 AM0 repliesview on HN

Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample