Sounds like moneygrab is accelerating before consumer grade local models are getting good enough for local inference in few years. Huge house of cards here. Demand skyrocketing until it’s suddenly dropping entirely with ondevice inference.
I'm already living in this future. In a decent execution framework, with context management, memory via unix, and mechanisms for web search and access, local models are effectively on par with frontier ones. And they can often be much faster. I'll keep paying fees for the AI companies until they stop truly subsidizing and leading. They are getting close to the edge of utility, but we can use their services now to bootstrap their own demise. Long live running your own software on your own computer.
> consumer grade local models are getting good enough for local inference
I am waiting for that. Perhaps a taalas kind of high-performance custom hw coding llm engine paired with an open-source coding-agent. Priced like a high-end graphics card which would be pay off over time. It will be a replay of the ibm-mainframe to PC transition of a previous era.
The consumer models are quite good already, the main bottleneck on local inference is hardware. But even then you can run tiny models on mostly anything, things only get harder as you try to scale up to more knowledgeable models and a larger context.