logoalt Hacker News

Aurornisyesterday at 8:57 PM1 replyview on HN

So everyone is aware, you can already run Qwen3.5-27B on Vulkan or Apple's hardware. Every major inference engine supports it right now.

This repo is a vibecoded demo implementation of some recent research papers combined with some optimizations that sacrifice quality for speed to get a big number that looks impressive. The 207 tok/s number they're claiming only appears in the headline. The results they show are half that or less, so I already don't trust anything they're saying they accomplished.

If you want to run Qwen3.5-27B you can do it with a project llama.cpp on CUDA, Vulkan, Apple, or even CPU.


Replies

Grimblewaldyesterday at 10:27 PM

This, even on android via termux you can run ollama with gpu accelaration on phone. This works, though milage will vary.