logoalt Hacker News

terhechtetoday at 4:49 PM1 replyview on HN

Thank you for NeoVim! I also use it every day, mostly for thinking / text / markdown though these days.

Have you compared against MLX? Sometimes I’m getting much faster responses but it feels like the quality is worse (eg tool calls not working, etc)


Replies

tarrudatoday at 5:35 PM

> Have you compared against MLX?

I don't think MLX supports similar 2-bit quants, so I never tried 397B with MLX.

However I did try 4-bit MLX with other Qwen 3.5 models and yes it is significantly faster. I still prefer llama.cpp due to it being a one in all package:

- SOTA dynamic quants (especially ik_llama.cpp) - amazing web ui with MCP support - anthropic/openai compatible endpoints (means it can be used with virtually any harness) - JSON constrained output which basically ensures tool call correctness. - routing mode