logoalt Hacker News

PhilippGille12/10/20251 replyview on HN

> He has GLM 4.5 Running at ~100 Tokens per second.

GLM 4.5 Air, to be precise. It's a smaller 166B model, not the full 355B one.

Worth mentioning when discussing token throughput.


Replies

dnhknglast Thursday at 7:05 AM

I'm downloading DeepSeek-V3.2-Speciale now at FP8 (reportedly Gold-medal performance in the 2025 International Mathematical Olympiad and International Olympiad in Informatics).

It will fit in system RAM, and as its mixture of experts and the experts are not too large, I can at least run it. Token/second speed will be slower, but as system memory bandwidth is somewhere around 5-600Gb/s, so it should feel OK.

show 1 reply