FWIW I have not, on a 64GB M1 Max, seen any advantage from oMLX specifically or MLX generally over GGUF with llama.cpp.
The Gemma 4 MLX builds I have found so far have been slower at the same quantisation and much slower with MTP.
The built-in web UI for llama.cpp is really quite good once you have chosen your model. Otherwise I quite like LM Studio for tinkering.
One thing I would say is that both Gemma-4 and Qwen 3.6 simply do not need a large chunk of the typical opencode system prompt. Better off without it.