I am running qwen 3.6 9b quantized model on my m4 pro 48gb and it is barely useful to do some basic pi.dev/cc driven development. I think 128gb desktops are the sweet setup to actually get meaningful work done. However, getting your hands on one of these machines is difficult at the moment.
As much fun as it is to run these things locally don’t forget that your time is not free. I am slowly migrating my use cases to openrouter and run the largest qwen model for < $2-3/day with serious use for personal projects.
Thanks for saying this. There's so much nonsense out there online about local models being better than Opus 4.7 and the like. It's just not true for regular users.
I have a brand new M5 MacBook Pro - top end with all the specs and I've tried local models and they're barely functional.
I'm using the 30b MOE model on same spec with 65k tokens as a sub agent with tooling and it absolutely writes decent code. The dense 9b I agree wasn't great.
How does it (the openrouter version) compare to ChatGPT 5.5 or Claude Opus 4.6?
Was the choice of such a small model driven by a desire for high tok/sec? I ask because an m4 pro 48gb machine can run larger models (if model intelligence is the thing that would make it more useful).