logoalt Hacker News

janalsncmtoday at 2:14 AM0 repliesview on HN

Their R1 paper was really well-done. But I think it leaves out a few details necessary for stable training.

https://cameronrwolfe.substack.com/p/grpo-tricks