I don't doubt the increase in efficiency. I doubt the "drastically".
We already see models become more and more capable per weight and per unit of compute. I don't expect a state-change breakthrough. I expect: more of the same. A SOTA 30B model from 2026 is going to be ~30% better than one from 2025.
Now, expecting that to hurt Nvidia? Delusional.
No one is going to stop and say "oh wow, we got more inference efficiency - now we're going to use less compute". A lot of people are going to say "now we can use larger and more powerful models for the same price" or "with cheaper inference for the same quality, we can afford to use more inference".
Eh.
Right now, Claude is good enough. If LLM development hit a magical wall and never got any better, Claude is good enough to be terrifically useful and there's diminishing returns on how much good we get out of it being at $benchmark.
Saying we're satisfied with that... well how many years until efficiency gains from one side and consumer hardware from the other meet in the middle so "good enough for everybody" open models are available for anyone who wants to pay for a $4000 MacBook (and after another couple of years a $1000 MacBook, and several more and a fancy wristwatch).
Point being, unless we get to a point where we start developing "models" that deserve civil rights and citizenship, the years are numbered to where we NEED cloud infrastructure and datacenters full of racks and racks of $x0,000 hardware.
I strongly believe the top end of the S curve is nigh, and with it we're going to see these trillion dollar ambitions crumble. Everybody is going to want a big-ass GPU and a ton of RAM but that's going to quickly become boring because open models are going to exist that eat everybody's lunch and the trillion dollar companies trying to beat them with a premium product aren't going to stack up outside of niche cases and much more ordinary cloud compute motivations.