> within a few years we will be running local models as good as today’s frontier
Unless there isn't some important breakthrough in hw production or in models architecture, it's quite the opposite: bigger, more expensive and more energy-intensive hw is needed today compared to 1 or 2 years ago.
Per frontier token. You're not calculating the cost of a fixed quality asset here. Old hw running non-frontier models will be very valuable. In fact, we have two direct examples: older server gpus actually appreciating and the very obvious fact that not everyone always use MAX FULL EFFORT BEST MODEL no matter what.
As good as today’s frontier. Gemma 4 today is roughly equivalent to the frontier a year and a half ago at gpt 4o tier.
I can run qwen3.6-27b on a four year-old Macbook Pro that dominates ChatGPT-4o (the frontier model from 2 years ago) and is competetitve against early ChatGPT-5 versions. We are also getting a lot smarter about using and deploying these local models. Your entire AI stack from two years ago would be absolutely crushed by a todays local LLM models and a high-end local inference system when combined with a good modern coding agent.