You can run pretty much every model on Vulkan, including the Qwen MoE models. You can also run prett...

SwellJoe • yesterday at 10:33 PM • 0 replies • view on HN

You can run pretty much every model on Vulkan, including the Qwen MoE models. You can also run pretty much every model on ROCm, Apple Silicon via MLX, and Intel hardware via OpenVINO. Nvidia got there first, but they're no longer clearly dominant in the self-hosting space, simply because of the high cost. I think Apple probably has the lead there, due to unified memory allowing big models to run without multiple big dedicated GPUs, but stuff like Strix Halo with 128GB of unified memory is also pretty much sold out everywhere. There's a lower bound on how small a model can be and still be useful.

Anyway, I don't have any Nvidia hardware, and I've got several local models running and/or training at all times.

alt Hacker News