logoalt Hacker News

dophyesterday at 3:06 PM1 replyview on HN

All tokens are symbols. All of the frontier models speak Mandarin.


Replies

boothbyyesterday at 5:03 PM

This is why misspellings and homophones are tells of human righting. LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.

show 3 replies