I have used omlx.ai with great success to both download multiple mlx models (including gemma and qwen) suited for my hardware AND to be able to automagically launch both open-source and close-source (claude code, codex) harnesses using these models. All from a web or desktop UI
You would not need to follow a blog post with omlx IMHO
In case anyone is looking for a sandbox to go with oMLX and Pi: https://github.com/Dotnaught/pi-sandbox
It truly is the SOTA for local inference on mac. Even when there are regressions the dev(s) are insanely responsive. It is the most impressive opensource project I've seen in a awhile
FWIW I have not, on a 64GB M1 Max, seen any advantage from oMLX specifically or MLX generally over GGUF with llama.cpp.
The Gemma 4 MLX builds I have found so far have been slower at the same quantisation and much slower with MTP.
The built-in web UI for llama.cpp is really quite good once you have chosen your model. Otherwise I quite like LM Studio for tinkering.
One thing I would say is that both Gemma-4 and Qwen 3.6 simply do not need a large chunk of the typical opencode system prompt. Better off without it.