logoalt Hacker News

bee_riderlast Monday at 5:13 AM1 replyview on HN

This is tangential because the task was to come up with the riddle, not solve it.

But, do reasoning models usually do this poorly?

It comes up with a valid solution, SAGE, then disqualifies it for incomprehensible reasons.

Then it discovers that SAGE works if it “reads it carefully.” But then seems to disqualify it(?), or at least goes to list other words for some reason.

Then it comes up with SAME, a word… with exactly the same shape as SAGE, just swapped out the irrelevant letter.

What is going on here? Is it programmed to constantly second-guess itself to make it better at finding weaknesses to its answers to harder riddles? But since it doesn’t know how to accept a good answer, it seems like it is just rolling the dice and then stopping at a random point.

I guess it is technically right, but the logic is a total mess.


Replies

yorwbalast Monday at 8:23 AM

The model isn't explicitly programmed to constantly second-guess itself, but when you do reinforcement learning with verifiable rewards (RLVR) where only the final answer is verified, even completely nonsensical reasoning can accidentally be rewarded if it gives correct results often enough.

E.g. if the model can generate multiple candidate solutions that are all equally likely (or unlikely) to be correct, it doesn't matter whether you stop at the first one or keep going until a random later one. But if the model can pick the correct solution from multiple candidates better than choosing uniformly at random, generating more candidates becomes an advantage, even if it sometimes results in discarding a correct solution in favor of another one.