logoalt Hacker News

aetherspawnyesterday at 11:18 PM0 repliesview on HN

80 tok/s which is kind of a lot for GLM. My experience running 80 tok/s on other LLM is that it ~seems faster than cloud inference, but that obviously depends what you use, in my case ChatGPT.