I've been running qwen3-5-9b-q4-k-m and qwen3-6-27b-q6-k simultaneously on an Intel Arc Pro B70 with a lot of success.
https://github.com/cptskippy/battlemage-llm-gateway
Opencode has been a huge productivity accelerator. I have two Hermes agents that I'm training to support my workflow with pretty good success. One is a personal assistant who manages my backlog and keeps me on task, follows up with me on items, and will put together research briefs. The other I use a general purpose coder and research and it's about 50:50 with the tasks I've given it. In fairness though, the task it failed at left me scratching my head to figure out as well.
Does Intel make decent GPUs now? I must be out of the loop...
What's the value running the smaller model too? Why not just the big model for everything? I note both are dense, as well.
Interesting setup, thx for sharing.
How many tokens/sec do you get with 27b? Are you using MTP?