It's artificial scarcity. LLM inference will soon be commodity as cloud. There is a 2-3years ...

isawczuk • last Thursday at 9:43 PM • 2 replies • view on HN

It's artificial scarcity. LLM inference will soon be commodity as cloud.

There is a 2-3years still before ASIC LLM inferences will catch up.

Replies

observationist • last Thursday at 10:01 PM

The problem with this idea is that someone can, and likely will, come up with the next best architecture that leapfrogs the current frontier models at least once a year, likely faster, for the foreseeable future. This means by the time you've manufactured your LLM on an ASIC, it's 4-5 generations behind, and probably much less efficient than current SOTA model at scale.

It won't make sense for ASIC LLMs to manifest until things start to plateau, otherwise it'll be cheaper to get smarter tokens on the cloud for almost all use cases.

That said, a 10 trillion parameter model on a bespoke compute platform overcomes a lot of efficiency and FOOM aspects of the market fit, so the angle is "when will models that can be run on an asic be good enough that people will still want them for various things even if the frontier models are 10x smarter and more efficient"

I think we're probably a decade of iteration on LLMs out, at least, and the entire market could pivot if the right breakthrough happens - some GPT-2 moment demonstrating some novel architecture that convinces the industry to make the move could happen any time now.

vessenes • last Thursday at 9:45 PM

I don't think so. GB200 prices are GOING UP. A100s are still expensive. This implies massive utilization and demand, no? These machines are not sitting idle, or prices would drop in the very competitive hyperscaler environment.

➕ show 2 replies

alt Hacker News

Replies