logoalt Hacker News

cubefoxyesterday at 9:20 PM0 repliesview on HN

Current models don't yet use RLVR with self-play though, at least as far as we know. They use RLVR with large numbers of manually created RL environments.

But they will probably use self-play soon. See https://www.amplifypartners.com/blog-posts/self-play-and-aut...