OK but if the verification loop really makes the agents MUCH more useful, then this usefulness diffe...

Davidzheng • last Tuesday at 11:56 PM • 1 reply • view on HN

OK but if the verification loop really makes the agents MUCH more useful, then this usefulness difference can be used as a training signal to improve the agents themselves. So this means the current capabilities levels are certainly not going to remain for very long (which is also what I expect but I would like to point out it's also supported by this)

Replies

hamiecod • yesterday at 7:49 AM

Thats a strong RL technique that could equal the quality of RLHF.

alt Hacker News

Replies