I think it's already out of date with verifiable reward based RL, e.g. on maths domain. When &q...

krackers • yesterday at 10:01 PM • 2 replies • view on HN

I think it's already out of date with verifiable reward based RL, e.g. on maths domain. When "correctness" arguments fall, the argument will probably just shift to whether it's just "intelligent brute force".

Replies

gipp • today at 12:10 AM

The set of tasks for which "correctness" is formally verifiable (in a way that doesn't put Goodharts Law in hyperdrive) is vanishingly small.

TheOtherHobbes • yesterday at 11:04 PM

"stochastic genius"

alt Hacker News

Replies