I'm using 4-bit as well, with the MoE model. I also use the MLX versions which are optimized fo...

nozzlegear • yesterday at 9:07 PM • 0 replies • view on HN

I'm using 4-bit as well, with the MoE model. I also use the MLX versions which are optimized for Apple CPUs (from what I understand anyway, I'm just an LLM layman). According to my oMLX dashboard, I'm getting about 50 tokens per second out of this model – not blazing fast, but more than fast enough to be useful to me.

https://huggingface.co/mlx-community/Qwen3.6-35B-A3B-OptiQ-4...

alt Hacker News