> He has GLM 4.5 Running at ~100 Tokens per second. GLM 4.5 Air , to be precise. It'...

PhilippGille • 12/10/2025 • 1 reply • view on HN

> He has GLM 4.5 Running at ~100 Tokens per second.

GLM 4.5 Air, to be precise. It's a smaller 166B model, not the full 355B one.

Worth mentioning when discussing token throughput.

Replies

I'm downloading DeepSeek-V3.2-Speciale now at FP8 (reportedly Gold-medal performance in the 2025 International Mathematical Olympiad and International Olympiad in Informatics).

It will fit in system RAM, and as its mixture of experts and the experts are not too large, I can at least run it. Token/second speed will be slower, but as system memory bandwidth is somewhere around 5-600Gb/s, so it should feel OK.

➕ show 1 reply

alt Hacker News

Replies