logoalt Hacker News

zamadatix01/21/20251 replyview on HN

Not 100% so for chain of thought models, they should recognize to spell the word letter by letter in some separated form and then count the tokens in that form. The Qwen distill seems to do exactly this really well:

> Step-by-step explanation:

> 1. Break down each word: "not", "really", "a", "tokenizer", "issue".

> 2. Count 'e's in each word:

> - "not": 0

> - "really": 1

> - "a": 0

> - "tokenizer": 2

> - "issue": 1

> 3. Sum the counts: 0 + 1 + 0 + 2 + 1 = 4.

>

> Answer: There are 4 E's in the phrase.

In the thought portion it broke the words up every which way you could think to check then validated the total by listing the letters in a number list by index and counting that compared to the sums of when it did each word.


Replies

spacemanspiff0101/21/2025

But the only way to do this is if it is trained on how to map the word token to character tokens ie

Hello -> h e l l o 66547 -> 12 66 88 88 3

Or, maybe it memorized that hello has a single e.

Either way, This seems to be a edge case that may or may not exist in the training data, but seems orthogonal to 'reasoning'

A better test case would be how it performs if you give the spelling mappings for each word the context?

show 2 replies