The beginning of scarcity in AI

163 points • by gmays • last Thursday at 8:49 PM • 208 comments • view on HN

Comments

We just had a realization during a demo call the other day:

The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up. Not being dependent on LLMs for your fundamental product’s value will be a major advantage, at least in pricing.

➕ show 15 replies

sixhobbits • last Friday at 3:37 PM

There is a lot of demand still coming for sure but I think I'm more optimistic. Ready to eat my hat on this but

- higher prices will result in huge demand destruction too. Currently we're burning a lot of tokens just because they're cheap, but a lot of heavy users are going to spend the time moving flows over to Haiku or onprem micro models the moment pricing becomes a topic.

- data centers do not take that long to build, probably there are bottlenecks in weird places like transformers that will cause some hicups, but nvidia's new stuff is waay more efficient and the overall pipeline of stuff coming online is massive.

- probably we will see some more optimization at the harness level still for better caching, better mix of smaller models for some use, etc etc.

These companies have so much money and they at least anthropic and openai are playing winner takes it all stakes, with competition from the smaller players too. I think they're going to be feeding us for free to win favour for quite a while still.

Let's see though.

➕ show 1 reply

dmazin • last Thursday at 9:45 PM

Constraints can lead to innovation. Just two things that I think will get dramatically better now that companies have incentive to focus on them:

* harness design

* small models (both local and not)

I think there is tremendous low hanging fruit in both areas still.

➕ show 6 replies

wg0 • last Thursday at 10:21 PM

There's other side to it too.

Whoever running and selling their own models with inference is invested into the last dime available in the market.

Those valuations are already ridiculously high be it Anthropic or OpenAI to the tune of couple of trillion dollars easily if combind.

All that investment is seeking return. Correct me if I'm wrong.

Developers and software companies are the only serious users because they (mostly) review output of these models out of both culture and necessity.

Anywhere else? Other fields? There these models aren't any useful or as useful while revenue from software companies by no means going to bring returns to the trillion dollar valuations. Correct me if I'm wrong.

To make the matter worst, there's a hole in the bucket in form of open weight models. When squeezed further, software companies would either deploy open weight models or would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives, whole careers are built on that. Correct me if I'm wrong.

Eventually - ROI might not be what VCs expect and constant losses might lead to bankruptcies and all that build out of data centers all of sudden would be looking for someone to rent that compute capacity result of which would be dime a dozen open weight model providers with generous usage tiers to capitalize on that available compute capacity owners of which have gone bankrupt and can't use it any more wanting to liquidate it as much as possible to recoup as much investment as possible.

EDIT: Typos

➕ show 3 replies

jFriedensreich • last Friday at 3:55 PM

This is probably even the "fun" part of the whole picture. The purely dystopia starts when investment firms just silently grow bigger and bigger data centers like cancer. There will be no press releases, no papers, no chance anyone without billions will even know the details yet alone get access. One day we realise the worlds resources (maybe not as in the paperclip maximiser, but as in memory, energy, GPUs, water, locations) are consumed by trading models and the data centres are already guarded by robot armies. While we were distracted frighting with anthropic and openAI the real war was already over. Mythos is one sign in this direction but i also met a few people who were claiming to fund fairly large research and training operations just by internal models working on financial markets. I have no way to verify those claims but this happened 3 times now and the papers/research they were working on looked pretty solid and did not seem like they were running kimi openclaw on polymarket but actual models on some significant funds. Would be really interested if anyone here has some details on this reality. I would also not be surprised if this is a thing that people in SF just claim to sound dangerous and powerful.

➕ show 1 reply

KaiserPro • last Friday at 10:50 AM

one graph, One graph and the author is pinning an entire theory on it?

Infra is always limited, even at hyper scalers. This leads to a bunch of tools dfofr caching, profiling and generally getting performance up, not to mention binpacking and all sorts of other "obvious" things.

➕ show 3 replies

skybrian • last Friday at 4:27 PM

That’s a lot of speculation based on one graph. It doesn’t cover whatever Google is doing, for example.

sdevonoes • last Friday at 9:23 AM

It’s time to be AI-independent. It’s like AWS, for most of us, it’s not worth it.

cowartc • last Friday at 3:07 PM

The scarcity framing assumes compute is the bottleneck. For most production deployment's Ive seen, the actual bottleneck is evaluation and knowing what to trust.

You can throw cheaper models at a problem all day but, if you can't measure where the model fails on your data, You're just making mistakes faster at a lower cost.

Compute gets cheaper. Reliable evaluation doesn't.

henry2023 • last Thursday at 9:55 PM

The US is bound by energy and China is bound by compute power. The one who solves its limitation first will end this “Scarcity Era”.

➕ show 4 replies

0xbadcafebee • last Friday at 2:13 PM

This isn't the first time they've dealt with scarcity, there's been supply chain scarcity four times since 2000. Post-dotcom boom, CDMA scarcity, HDD/flash scarcity, Pandemic scarcity.

The scarcity isn't long-term. Like all manufactured products, they'll ramp up production and flood the market with hardware, people will buy too much, market will drop. Boom and bust.

We're also still in the bubble. Eventually markets will no longer bear the lack of productivity/profit (as AI isn't really that useful) and there will be divestment and more hardware on the market as companies implode. Nobody is making 10x more from AI, they are just investing in it hoping for those profits which so far I don't think anyone has seen, other than in the companies selling the AI to other companies.

But more importantly, the models and inference keeps getting more efficient, so less hardware will do more in the future. We already have multiple models good enough for on-device small-scale work. In 5 years consumer chips and model inference will be so good you won't need a server for SOTA. When that happens, most of the billions invested in SOTA companies will disappear overnight, which'll leave a sizeable hole in the market.

➕ show 1 reply

2001zhaozhao • last Thursday at 11:43 PM

AKA, the beginning of big companies being able to roll over small companies with moar money

(note: I don't expect this to actually happen until the AI gets good enough to either nearly entirely replace humans or solve cooperation, but the long term trend of scarce AI will go towards that direction)

ttul • last Friday at 5:35 AM

Energy scarcity will drive more innovation in local silicon and local inference. Apple will be the unexpected beneficiary of this reality.

siliconc0w • last Friday at 1:43 PM

Definitely feeling this - the subsidized subscription plans are already starting to buckle.

com2kid • last Thursday at 9:58 PM

To bang on the same damn drum:

Open Weight models are 6 months to a year behind SOTA. If you were building a company a year ago based on what AI could do then, you can build a company today with models that run locally on a user's computer. Yes that may mean requiring your customers to buy Macbooks or desktops with Nvidia GPUs, but if your product actually improves productivity by any reasonable amount, that purchase cost is quickly made up for.

I'll argue that for anything short of full computer control or writing code, the latest Qwen model will do fine. Heck you can get a customer service voice chat bot running in 8GB of VRAM + a couple gigs more for the ASR and TTS engine, and it'll be more powerful than the hundreds of millions spent on chat bots that were powered by GPT 4.x.

This is like arguing the age of personal computing was over because there weren't enough mainframes for people to telnet into.

It misses the point. Yes deployment and management of personal PCs was a lot harder than dumb terminal + mainframe, but the future was obvious.

➕ show 5 replies

latentframe • last Friday at 12:57 PM

This isn’t really looking like AI scarcity it’s more like compute becoming the bottleneck : when the access depends on chips energy and capital it stops being a pure software game and the winners are often whoever can secure capacity first

vessenes • last Thursday at 9:36 PM

It seems very possible that we have at least five years of real limitations on compute coming up. Maybe ten, depending on ASML. I wonder what an overshoot looks like. I also wonder if there might be room for new entrants in a compute-scarce environment.

For instance, at some point, could Coreweave field a frontier team as it holds back 10% of its allocations over time? Pretty unusual situation.

➕ show 1 reply

NoSalt • last Friday at 2:59 PM

A few years ago, I purchased a handful of 250GB SSDs from amazon for $17.00 each.

Last year, I purchased a few 8TB hard drives for $80.00 each.

Today, I am sad. ;-(

utopiah • last Friday at 7:36 AM

Initially I thought "Well... good for AI companies because they can then charge more" but IMHO that's a very tricky position because it means the cheap wave is behind us.

It's one thing to "sell" free or symbolically cheap stuff, it's another to have an actual client who will do the math and compare expenditure vs actually delivered value.

➕ show 1 reply

bcjdjsndon • last Friday at 12:12 PM

Neither is this the first time nor are they really confronting it

tim333 • last Friday at 8:31 AM

>For the first time since the 2000s, technology companies are confronting the limits of their supply chain.

I thought there'd been a shortage of cheap GPUs since ChatGPT took off and also before that in various crypto booms. I'm not sure it's a new thing.

➕ show 1 reply

stupefy • last Thursday at 9:34 PM

What limits LLM inference accelerators? I heard about Groq (https://groq.com/) not sure how much it pushes away the problem.

➕ show 1 reply

mattas • last Thursday at 9:36 PM

This notion that "we don't have enough compute" does not cleanly reconcile with the fact that labs are burning cash faster than any cohort of companies in history.

If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."

➕ show 5 replies

piokoch • last Friday at 7:20 AM

Well it's in the books. O(n^2) algorithms are bad in the long run, transformers algorithm has such complexity, so not a big surprise we hit the limits.

chatmasta • last Friday at 11:34 AM

Why is written with an assumption that we have finite hardware production capacity? Industrial processes can scale up, new factories can come online… it will take a while but the whole point of economics is that supply will scale to meet demand. The shortage is a temporary, point-in-time metric.

And that’s not considering the software innovation that can happen in the meantime.

➕ show 1 reply

frigg • last Friday at 12:14 PM

The models have already plateaued, you don't need latest and greatest.

Bengalilol • last Friday at 11:54 AM

... and I have this little idea in the back of my mind: when companies can no longer keep up with demand and people have (albeit more limited and reduced) local capacity, minds will start focusing on techniques (more humble and modest ones) to keep part of the system running locally, without dependency.

I know it may sound ridiculous, but it could actually become a way to break away from the business models that have been developed over the past few decades. Broadly speaking, this even amounts to saying that the biggest victims of AI could be the companies that bet on AI as a service.

Yet I know my vision is way too idealistic but I'm coming to imagine that a human brain, although less efficient in the long run, remains a reliable way to control the resulting costs and could even turn out to be more advantageous and more readily available than its silicon-based counterpart.

➕ show 1 reply

czk • last Thursday at 9:53 PM

"adaptive" thinking

itmitica • last Thursday at 9:53 PM

The current inference system is on a down slope.

It remains to be seen what new wave of AI system or systems will replace it, making the whole current architecture obsolete.

Meanwhile, they are milking it, in the name of scarcity.

byyoung3 • last Thursday at 10:04 PM

distillation is an equalizing force

➕ show 1 reply

insane_dreamer • last Friday at 4:16 PM

Companies who become dependent on AI to "optimize" their offering/processes are going to be faced with some serious vendor lock-in unless they do it in such a way that they can swap out the foundation model.

rafaelero • last Friday at 3:01 PM

Companies who could see it clearly and ignored the "AI is a bubble duh" crowd will ultimately get benefited by the GPUs they already acquired. The companies who acted cautiously will get burned.

AtlasBarfed • last Friday at 2:49 PM

Pay for the latest AI for EXCLUSIVE POWERS ...

Trying to up-tier fractional improvements in something that can't be quantified easily, and !BONUS! with gated access it can't be as easily analyzed by the (low profit) AI analyzers/benchmarkers.

Foster paranoia among top executives that the fractional/debatable improvement is a MUST HAVE to STAY COMPETITIVE in your industry.

Meanwhile, I have not seen any improvement in software in the now almost ?three to four? years than mainstream LLM and AI coding assistance has arrived on the scene.

Although I will hold out the possibility that software has actually gotten far worse for the end user, because AI code is being dedicated to revenue enhancement and dark data collection.

yalogin • last Thursday at 10:08 PM

Does this also mean ram prices are not coming down anytime soon?

➕ show 3 replies

isawczuk • last Thursday at 9:43 PM

It's artificial scarcity. LLM inference will soon be commodity as cloud.

There is a 2-3years still before ASIC LLM inferences will catch up.

➕ show 2 replies

paulddraper • last Thursday at 10:02 PM

This is wrong along multiple axes.

1. Supply can scale. You can point to COVID/supply-chain shocks, but the problem there is temporary changes. No one spins up a whole fab to address a 3 month spike. Whereas AI is not a temporary demand change.

2. Models are getting more efficient. DeepSeek V3 was 1/10th the cost of contemporary ChatGPT. Open weight models get more runnable or smarter every month. Cutting edge is always cutting edge, but if scarcity is real, model selection will adjust to fit it.

throwaway290 • last Friday at 8:17 AM

wasnt ai supposed to get us post-scarcity?

➕ show 1 reply

mystraline • last Friday at 2:33 PM

And folks are just now realizing the SaaS token provider rug-pull?

How convenient, especially since everything has some LLM slop interaction.

But that rug isnt going to pull itself!

Lapalux • last Thursday at 9:28 PM

"The first hit is free....."

builderminkyu • last Friday at 2:57 PM

[flagged]

hemangjoshi37a • last Friday at 10:15 AM

[dead]

SadErn • last Thursday at 9:34 PM

[dead]

alt Hacker News

The beginning of scarcity in AI

Comments