Your other comment sounded like you were interested in learning about how AI labs are applying RL to...

thorum • yesterday at 9:50 AM • 2 replies • view on HN

Your other comment sounded like you were interested in learning about how AI labs are applying RL to improve programming capability. If so, the DeepSeek R1 paper is a good introduction to the topic (maybe a bit out of date at this point, but very approachable). RL training works fine for low resource languages as long as you have tooling to verify outputs and enough compute to throw at the problem.

Replies

whimsicalism • yesterday at 3:39 PM

imo generally not worth it to keep going when you encounter this sort of HN archetype

measurablefunc • yesterday at 8:04 PM

So you should have no problem bringing up the exact passages & equations they use for their policies.

alt Hacker News

Replies