logoalt Hacker News

egorfinetoday at 3:55 PM1 replyview on HN

Qwen 3.6 35B running on oMLX 0.3.9rc1: on oMLX I get 86 t/s on Q4 and 74 t/s on Q6.

Bear in mind that ttft on MLX is much much faster on M5 Pro as compared to M4 Pro.

Also bear in mind that those figures are with NO optimizations whatsoever: no MCP, no DFlash. I am waiting for both to be released for the Qwen models.


Replies

busfahrertoday at 7:43 PM

Great, thanks! :-) and to mirror another poster: what kind of prompt parsing (prefill) speed do you get for that model? Also how is the speed for the 27B model?