Unfortunately, there's really no clear path to viable local models for the common folk, the hardware requirements are just too extreme. I say this as someone with a pair of A100s that is absolutely delighted by what the open source models are capable of, but even with the best harnesses, tiny quantized models are just not even close to the same league as something like Kimi-k2.
Of course, here on HN it's easy to find folks who get a lot out of tinkering with tiny models, but the masses don't want to tinker with toys, they want something fast with a large context and approximating at least Opus 4.6 level reliability and capability, which simply can't be squeezed into a quantized 60b model.
Right now, yes but I am fairly confident this will change. Not only do I truly believe we will see massive efficiency gains in inference, I also believe the cost of hardware will come down. Again Nvidia's getting a 75% margin on this hardware. Usually hardware margins are significantly smaller. More supply will come online even if that takes years.