Strawberry is "difficult" not because the reasoning is difficult, but because tokenization doesn't let the model reason at the level of characters. That's why it has to work so hard and doesn't trust its own conclusions.
Yeah, but it clearly breaks down the spelling correctly in it's reasoning, e.g. a letter per line. So it gets past the tokenization barrier, but still gets hopelessly confused.
Yeah, but it clearly breaks down the spelling correctly in it's reasoning, e.g. a letter per line. So it gets past the tokenization barrier, but still gets hopelessly confused.