But how smart is it? All the people running local models never seem to mention that they are way dumber than cloud models.
I don't care how many tokens per second of nonsense it can generate.
> But how smart is it? All the people running local models never seem to mention that they are way dumber than cloud models.
Well, you aren't going to give it a 20k line sec and have it churn out a full app after 4 hours hours.
But, you can get it to write code for you if you do the design.
Quantized Gemma 4 26B is as smart or better than GPT 5 in most of my testing. Granted GPT 5 is nearly a year old at this point, but I can run Gemma 4 on a ~6 year old consumer GPU (RTX 3090) and get 140 t/s.
It is smart enough that I use for all my coding tasks, and a lot of other mundane tasks.
It is probably not smart enough for "design this whole architecture of this complex system from scratch, make no mistakes", but that is not something I want from a coding tool anyway. I want a model that I can point to a file and tell it to make some changes to the file and related files. Or that I can ask to review a PR with regards to certain aspects.
My suggestion is to simply try it and see what it feels like.
Its not going to be as good as Claude, but if you know what you're doing, it may be good enough to get your work done.
Qwen 3.6 35b a3b is about as good as sonnet 4.5. It varies but it's at that level.