logoalt Hacker News

andy12_today at 4:47 PM0 repliesview on HN

Reinforcement learning is not done with random data found on the internet; it's done with curated high-quality labeled datasets. Although there have been approaches that try to apply reinforcement learning to pre-training[1] (to learn in an unsupervised way a predict-the-next-sentence objective), as far as I know it doesn't scale.

[1] https://arxiv.org/pdf/2509.19249