I don't think you're going to get many "true" answers to this. The opportunity cost of not using the latest and best models is just too much right now.
Every month I research this and come to the same conclusion: the time, effort, and cost required to get local models (and the coding tools around them) to perform even close to Claude Code with sonnet/opus just not worth it right now. If it was, it would be distributive enough to be in the news.
Not that I'm discounting someone hasn't already solved this, just trying to Occam razor my way out of diving too deep down rabbit holes.
At some point, there will come a saturation point for that "Opportunity cost FOMO train ride", and I think we are already past that point. Mythos class models are a whole different beasts and cutting edge on reasoning but not much use for the problem domains most developers are trying to solve.
The present Sonnet/Opus versions (~4.8) will likely be what everyone in the enterprise might end up using eventually. And even though local models aren't there yet, there are budget alternatives from the families of DeepSeek, Kimi, GPT, MiniMax, etc. available through APIs of NVidida, OpenRouter, Groq, etc. which are very much Sonnet grade.
But you're pretty much measuring opportunity cost in tokens per second, no?
I think it strongly remains to be seen whether e.g. tokens per second (multiplied or whatever by percieved quality of private model) actually means "better or more useful output."
I strongly suspect it does not. (though I also strongly suspect this will be very difficult to measure because the incentive to lie about metrics here will be so strong.)
This seems to be the answer. Building a rig with a decent graphics card will cost $2k+ and will produce sub-par results. Might as well milk the $100/m Claude sub until open-source alternatives reach parity with today's frontier models.