Models have been capped out on training and (active) parameters a while ago, its tooling / harn...

jorvi • today at 5:21 AM • 0 replies • view on HN

Models have been capped out on training and (active) parameters a while ago, its tooling / harness that is making the big jumps in performance happen. And then you have things like DeepSeek with a pretty small KV cache.

And with the extreme chip shortages for the next two years, there's little appetite for even bigger models anyway.

Barring a breakthrough in scaling, the only direction the models can really go is smaller. Which will inevitably mean better performing local models for same chip budget.

alt Hacker News