Did not seem to find how much tokens per second he achieved with this setup ?

maxignol • yesterday at 8:02 PM • 1 reply • view on HN

Replies

80 tok/s which is kind of a lot for GLM. My experience running 80 tok/s on other LLM is that it ~seems faster than cloud inference, but that obviously depends what you use, in my case ChatGPT.

alt Hacker News

Replies