80 tok/s which is kind of a lot for GLM. My experience running 80 tok/s on other LLM is th...

aetherspawn • yesterday at 11:18 PM • 0 replies • view on HN

80 tok/s which is kind of a lot for GLM. My experience running 80 tok/s on other LLM is that it ~seems faster than cloud inference, but that obviously depends what you use, in my case ChatGPT.

alt Hacker News