Qwen 3.6 35B running on oMLX 0.3.9rc1: on oMLX I get 86 t/s on Q4 and 74 t/s on Q6.
Bear in mind that ttft on MLX is much much faster on M5 Pro as compared to M4 Pro.
Also bear in mind that those figures are with NO optimizations whatsoever: no MCP, no DFlash. I am waiting for both to be released for the Qwen models.
Great, thanks! :-) and to mirror another poster: what kind of prompt parsing (prefill) speed do you get for that model? Also how is the speed for the 27B model?