logoalt Hacker News

xiphias201/20/20252 repliesview on HN

It's funny because this simple excercise shows all the problems that I have using the reasoning models: they give a long reasoning that just takes too much time to verify and still can't be trusted.


Replies

byteknight01/20/2025

I may be looking at this too deeply, but I think this suggests that the reasoning is not always utilized when forming the final reply.

For example, IMMEDIATELY, upon it's first section of reasoning where it starts counting the letters:

> R – wait, is there another one? Let me check again. After the first R, it goes A, W, B, E, then R again, and then Y. Oh, so after E comes R, making that the second 'R', and then another R before Y? Wait, no, let me count correctly.

1. During its counting process, it repeatedly finds 3 "r"s (at positions 3, 8, and 9)

2. However, its intrinsic knowledge that "strawberry" has "two Rs" keeps overriding this direct evidence

3. This suggests there's an inherent weight given to the LLM's intrinsic knowledge that takes precedence over what it discovers through step-by-step reasoning

To me that suggests an inherent weight (unintended pun) given to its "intrinsic" knowledge, as opposed to what is presented during the reasoning.

show 2 replies
naasking01/21/2025

Strawberry is "difficult" not because the reasoning is difficult, but because tokenization doesn't let the model reason at the level of characters. That's why it has to work so hard and doesn't trust its own conclusions.

show 1 reply