There is a piece of this I agree with. That you do not need to be a deep technical expert to notice that a company is burning cash by overcommitting to capex, or relying on heroic revenue projections that may or may not come to pass.
But that is not the full argument he is making. If the claim is that the labs will not be able to pay their creditors because inference is structurally incapable of becoming profitable, then he absolutely needs to be right about the technical economics of inference.
One part of that is the balance-sheet argument (which already shows insanely good margins). But it also depends on how inference-time compute actually works: routing, batching, kv cache reuse, model segmentation, different latency tiers, etc. Much of those details he's just been straight up wrong about in his writing, so as a result I have to call into question the rest of his reasoning as well (in part to avoid Gell-Mann amnesia).
[dead]
Doesn't this kinda imply its own smoke and mirrors though? Like if the name of the game with inference is already routing things around and caching so you can make money, why is the newest biggest model always the most important critical thing? How does this square with any of their press about it? Also wouldn't that just add more inference? Because you need to pre-judge every prompt to know where to route it?
Also, if there is significant gains from caching, then like.. what are even doing here? Inputting something and then reading cached pieces of text based on their similarity to the input? Kinda like a search engine?