logoalt Hacker News

maxignolyesterday at 8:02 PM1 replyview on HN

Did not seem to find how much tokens per second he achieved with this setup ?


Replies

aetherspawnyesterday at 11:18 PM

80 tok/s which is kind of a lot for GLM. My experience running 80 tok/s on other LLM is that it ~seems faster than cloud inference, but that obviously depends what you use, in my case ChatGPT.