If you're on a Mac, use the MLX backend versions which are considerably faster than the GGML ba...

Patrick_Devine • yesterday at 5:47 PM • 0 replies • view on HN

If you're on a Mac, use the MLX backend versions which are considerably faster than the GGML based versions (including llama.cpp) and you don't need to fiddle with the context size. The models are `qwen3.6:35b-a3b-nvfp4`, `qwen3.6:35b-a3b-mxfp8`, and `qwen3.6:35b-a3b-mlx-bf16`.

alt Hacker News