Could you please share your time to first token and tok/s?

discordance • today at 3:05 AM • 2 replies • view on HN

Replies

M4 Pro 64GB (14 CPU / 20 GPU), Gemma 4 31B Q4_K_M GGUF, LM Studio: time to first token 0.92s, 11.56 tokens/s.

Edit: For comparison with the other poster, same setup as above, but with Gemma 4 31B Instruct 8bit MLX (not sure if exactly the same model): time to first token 4.62s, 7.20 tokens/s; with a different prompt, 1.17s and 7.24 tokens/s.

➕ show 1 reply

ls612 • today at 3:29 AM

I’m on an M2 Max and get 10 tok/s with Gemma 4 8bit MLX

alt Hacker News

Replies