logoalt Hacker News

hodgehog11yesterday at 10:58 PM1 replyview on HN

This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.

https://arxiv.org/abs/2504.13837


Replies

narratoryesterday at 11:44 PM

The search space for the game of Go was also thought to be too large for computers to manage.

show 2 replies