logoalt Hacker News

krackersyesterday at 10:01 PM2 repliesview on HN

I think it's already out of date with verifiable reward based RL, e.g. on maths domain. When "correctness" arguments fall, the argument will probably just shift to whether it's just "intelligent brute force".


Replies

gipptoday at 12:10 AM

The set of tasks for which "correctness" is formally verifiable (in a way that doesn't put Goodharts Law in hyperdrive) is vanishingly small.

TheOtherHobbesyesterday at 11:04 PM

"stochastic genius"