Do you think AlphaGo is regurgitating human gameplay? No it’s not: it’s learning an optimal policy b...

aspenmartin • today at 2:49 PM • 2 replies • view on HN

Do you think AlphaGo is regurgitating human gameplay? No it’s not: it’s learning an optimal policy based on self play. That is essentially what you’re seeing with agents. People have a very misguided understanding of the training process and the implications of RL in verifiable domains. That’s why coding agents will certainly reach superhuman performance. Straw/steel man depending on what you believe: “But they won’t be able to understand systems! But a good spec IS programming!” also a bad take: agents absolutely can interact with humans, interpret vague deseridata, fill in the gaps, ask for direction. You are not going to need to write a spec the same way you need to today. It will be exactly like interacting with a very good programmer in EVERY sense of the word

Replies

gf000 • today at 3:20 PM

How does alphago come into picture? It works in a completely different way all together.

I'm not saying that LLMs can't solve new-ish problems, not part of the training data, but they sure as hell not got some Apple-specific library call from a divine revelation.

➕ show 1 reply

LatencyKills • today at 3:05 PM

Well said.

alt Hacker News

Replies