FWIW I have not, on a 64GB M1 Max, seen any advantage from oMLX specifically or MLX generally...

dofm • yesterday at 9:14 PM • 0 replies • view on HN

FWIW I have not, on a 64GB M1 Max, seen any advantage from oMLX specifically or MLX generally over GGUF with llama.cpp.

The Gemma 4 MLX builds I have found so far have been slower at the same quantisation and much slower with MTP.

The built-in web UI for llama.cpp is really quite good once you have chosen your model. Otherwise I quite like LM Studio for tinkering.

One thing I would say is that both Gemma-4 and Qwen 3.6 simply do not need a large chunk of the typical opencode system prompt. Better off without it.

alt Hacker News