I believe DeepSeek models do split numbers up into digits, and this provides a large boost to abilit...

versteegen • 01/21/2025 • 1 reply • view on HN

I believe DeepSeek models do split numbers up into digits, and this provides a large boost to ability to do arithmetic. I would hope that it's the standard now.

Replies

maxrmk • 01/22/2025

Could be the case, I’m not familiar with their specific tokenizers. IIRC llama 3 tokenizes in chunks of three digits. That seems better than arbitrary sized chunks with BPE, but still kind of odd. The embedding layer has to learn the semantics of 1000 different number tokens, some of which overlap in meaning in some cases and not in others, e.g 001 vs 1.

alt Hacker News

Replies