logoalt Hacker News

discordancetoday at 3:05 AM2 repliesview on HN

Could you please share your time to first token and tok/s?


Replies

isomorphictoday at 5:25 AM

M4 Pro 64GB (14 CPU / 20 GPU), Gemma 4 31B Q4_K_M GGUF, LM Studio: time to first token 0.92s, 11.56 tokens/s.

Edit: For comparison with the other poster, same setup as above, but with Gemma 4 31B Instruct 8bit MLX (not sure if exactly the same model): time to first token 4.62s, 7.20 tokens/s; with a different prompt, 1.17s and 7.24 tokens/s.

show 1 reply
ls612today at 3:29 AM

I’m on an M2 Max and get 10 tok/s with Gemma 4 8bit MLX