logoalt Hacker News

wahnfriedentoday at 4:56 AM2 repliesview on HN

No one is going to run models that are comparable to frontier locally without spending enormous sums for use at scale or in large orgs. Even with cheap RAM, you will still need a very large budget for frontier-level capability.

Open models that are competitive with frontier will be used on shared hosts.


Replies

zozbot234today at 6:38 AM

> No one is going to run models that are comparable to frontier locally without spending enormous sums for use at scale

You can always run these models cheaper locally if you're willing to compromise on total throughput and speed of inference. For most end-user or small-scale business needs, you don't really need a lot of either.

show 1 reply
jorvitoday at 5:21 AM

Models have been capped out on training and (active) parameters a while ago, its tooling / harness that is making the big jumps in performance happen. And then you have things like DeepSeek with a pretty small KV cache.

And with the extreme chip shortages for the next two years, there's little appetite for even bigger models anyway.

Barring a breakthrough in scaling, the only direction the models can really go is smaller. Which will inevitably mean better performing local models for same chip budget.