This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what hap...

hodgehog11 • yesterday at 10:58 PM • 1 reply • view on HN

This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.

https://arxiv.org/abs/2504.13837

Replies

narrator • yesterday at 11:44 PM

The search space for the game of Go was also thought to be too large for computers to manage.

➕ show 2 replies

alt Hacker News

Replies