80 tok/s which is kind of a lot for GLM. My experience running 80 tok/s on other LLM is that it ~seems faster than cloud inference, but that obviously depends what you use, in my case ChatGPT.