logoalt Hacker News

mvkel01/22/20251 replyview on HN

This is incredibly fascinating.

I feel like one round of RL could potentially fix "short circuits" like these. It seems to be convinced that a particular rule isn't "allowed," when it's totally fine. Wouldn't that mean that you just have to fine tune it a bit more on its reasoning path?


Replies

byteknight01/22/2025

I believe this comes from our verbiage.

If I asked you, "hey. How many Rs in strawberry?". You're going to tell me 2, because the likelihood is I am asking about the ending Rs. That's at least how I'd interpret the question without the "llm test" clouding my vision.

Same for if I asked how many gullible. I'd say "it's a double L after the u".

It's my guess this has muddled the training data.