logoalt Hacker News

daxfohltoday at 5:39 AM8 repliesview on HN

I just wonder how long it'll take local models to be good enough for 99% of use cases. It seems like it has to happen sooner or later.

My hunch is that in five years we'll look back and see current OpenAI as something like a 1970's VAX system. Once PCs could do most of what they could, nobody wanted a VAX anymore. I have a hard time imagining that all the big players today will survive that shift. (And if that particular shift doesn't materialize, it's so early in the game; some other equally disruptive thing will.)


Replies

maxlohtoday at 10:59 AM

In my experience with Gemini, most of its capabilities stem from web searching instead of something it has already "learned." Even if you could obtain the model weights and run them locally, the quality of the output would likely drop significantly without that live data.

To really have local LLMs become "good enough for 99% of use cases," we are essentially dependent on Google's blessing to provide APIs for our local models. I don't think they have any interest in doing so.

show 2 replies
sdrinftoday at 8:33 AM

Taking the opposite side of that bet, here is why:

* even if an openweight model appears on huggingface today, exceeding SOTA, given my extensive experience with a wide variety of model sizes, I would find it highly surprising the "99% of use cases" could be expressed in <100B model.

* Meanwhile: I pulled claude to look into consumer GPU VRAM growth rates, median consumer VRAM went 1-2GB @ 2015 to ~8GB @ 2026, rougly doubles every 5 years; top-end isn't much better, just ahead 2 cycles.

* Putting aside current ram sourcing issues, it seems very unlikely even high-end prosumers will routinely have >100GB VRAM (=ability to run quantized SOTA 100b model) before ~2035-2040.

show 4 replies
Havoctoday at 9:57 AM

Think a large portion of people won’t take “good enough” if better is available for cheaper.

Datacenters simply scale better than homesevers on cost and performance

So only really works for people that value local highly - which isn’t most people.

athrowaway3ztoday at 10:54 AM

5 years is a bit optimistic. I have no desire to use anything dumber than Claude - but I doubt I'll need something much smarter either - or with so much niche knowledge baked in. The harness will take care of much. Faster would be nicer though.

That still requires a pretty large chip, and those will be selling at an insane premium for at least a few more years before a real consumer product can try their hand at it.

tim333today at 10:35 AM

The trend with email, websites and so on has been to use some large cloud service rather than self host as it's easier. My bet is AI will be similar.

show 1 reply
foo42today at 7:55 AM

I hope you're right, but is there any guarantee that there will continue to be institutions willing to spend the money to produce open models?

I almost wonder if we need some sort of co-op for training and another for hosted inference

show 1 reply
bandramitoday at 6:58 AM

Yesterday I asked mistral to list five mammals that don't have "e" in their name. Number three was "otter" and number five was "camel".

phi4-mini-reasoning took the same prompt and bailed out because (at least according to its trace) it interpreted it as meaning "can't have a, e, i, o, or u in the name".

Local is the only inference paradigm I'm interested in, but these things have a way to go.

show 3 replies
otabdeveloper4today at 9:43 AM

> I just wonder how long it'll take local models to be good enough for 99% of use cases.

Qwen 2.5 was already there. "99% of use cases" isn't a very high bar right now.