logoalt Hacker News

vladguryesterday at 7:11 PM3 repliesview on HN

I have used omlx.ai with great success to both download multiple mlx models (including gemma and qwen) suited for my hardware AND to be able to automagically launch both open-source and close-source (claude code, codex) harnesses using these models. All from a web or desktop UI

You would not need to follow a blog post with omlx IMHO


Replies

dofmyesterday at 9:14 PM

FWIW I have not, on a 64GB M1 Max, seen any advantage from oMLX specifically or MLX generally over GGUF with llama.cpp.

The Gemma 4 MLX builds I have found so far have been slower at the same quantisation and much slower with MTP.

The built-in web UI for llama.cpp is really quite good once you have chosen your model. Otherwise I quite like LM Studio for tinkering.

One thing I would say is that both Gemma-4 and Qwen 3.6 simply do not need a large chunk of the typical opencode system prompt. Better off without it.

Dotnaughtyesterday at 8:14 PM

In case anyone is looking for a sandbox to go with oMLX and Pi: https://github.com/Dotnaught/pi-sandbox

show 1 reply
fridderyesterday at 7:14 PM

It truly is the SOTA for local inference on mac. Even when there are regressions the dev(s) are insanely responsive. It is the most impressive opensource project I've seen in a awhile

show 1 reply