Reinforcement learning is not done with random data found on the internet; it's done with curat...

andy12_ • today at 4:47 PM • 0 replies • view on HN

Reinforcement learning is not done with random data found on the internet; it's done with curated high-quality labeled datasets. Although there have been approaches that try to apply reinforcement learning to pre-training[1] (to learn in an unsupervised way a predict-the-next-sentence objective), as far as I know it doesn't scale.

[1] https://arxiv.org/pdf/2509.19249

alt Hacker News