I think that’s mostly because they get so much more of that reinforcement learning - since it is so ...

jeremyjh • today at 4:03 PM • 2 replies • view on HN

I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet.

Replies

mjburgess • today at 4:36 PM

No one is curating vast amounts of data for them in other domains. Programmers send programs with fixes

➕ show 1 reply

emp17344 • today at 6:19 PM

RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.

alt Hacker News

Replies