Forgive me if this is a naive assumption, but wouldn’t large language models be fundamentally differ...

cyberge99 • yesterday at 2:58 PM • 2 replies • view on HN

Forgive me if this is a naive assumption, but wouldn’t large language models be fundamentally different for a language that is largely symbols? Again, my understanding of Mandarin is limited if it exists at all.

Replies

doph • yesterday at 3:06 PM

All tokens are symbols. All of the frontier models speak Mandarin.

➕ show 1 reply

wat10000 • yesterday at 7:35 PM

"飞机" and "airplane" aren't fundamentally different in terms of how they're represented to a computer. Especially for an LLM, where tokenization likely turns each of those into a single token.

alt Hacker News

Replies