Sounds like you're a candidate for a local model. It's kinda nice not caring what the token count means except as to compaction.
I do love using local models when I can, but qwen-35B is the best model I can run, and while its an insanely good local model, it does not compare to the big ones.
Not paying per token? Not sending my code to someone else's servers for inference? That's the stuff of sweet dreams for a stingy, paranoid solopreneur like me.
If I could run a local model comparable to even Sonnet 4.6 without shelling out $50K in hardware, I'd do it in a heartbeat. But all I have is a 32 GB of RAM and an old RTX 4080.
Or am I not up to speed? Are there decent coding models that can run on dev laptops? Not that that's what you were suggesting by recommending a local model, necessarily; just curious.