logoalt Hacker News

gf000yesterday at 3:20 PM1 replyview on HN

How does alphago come into picture? It works in a completely different way all together.

I'm not saying that LLMs can't solve new-ish problems, not part of the training data, but they sure as hell not got some Apple-specific library call from a divine revelation.


Replies

aspenmartinyesterday at 3:29 PM

AlphaGo comes into the picture to explain that in fact coding agents in verifiable domains are absolutely trained in very similar ways.

It’s not magic they can’t access information that’s not available but they are not regurgitating or interpolating training data. That’s not what I’m saying. I’m saying: there is a misconception stemming from a limited understanding of how coding agents are trained that they somehow are limited by what’s in the training data or poorly interpolating that space. This may be true for some domains but not for coding or mathematics. AlphaGo is the right mental model here: RL in verifiable domains means your gradient steps are taking you in directions that are not limited by the quality or content of the training data that is used only because starting from scratch using RL is very inefficient. Human training data gives the models a more efficient starting point for RL.