Are ya fuckin' serious mate?
The restaurant next to the mines were profitable up until the moment the mines themselves shut down: one doesn't exist without the other.
You can't ringfence inference as "the profitable bit" and then hand-wave away the training. Without continuous training there is no inference product.
Claude 3 Opus isn't sitting there making revenue in 2026 - the thing is just deprecated. The moment you stop spending billions on the next model, your "profitable" inference business is on borrowed time until someone else makes it obsolete.
Maybe I made a mistake in my analogy... They're not growing a farm and then selling oranges. They're on a treadmill where stopping is death, and the treadmill costs $10bn a year to keep running.
> They're on a treadmill where stopping is death, and the treadmill costs $10bn a year to keep running.
You’re literally describing all companies. Google takes about $270bn/year to run. If they stopped spending that they’d die pretty darn quick. It’s also a description of working - unless you’d built up significant savings, if you stopped working you’re also going to die.
What's the point of these words and analogies when the only thing that matters is numbers. Gross margins of 20% versus 70% makes a world of difference (literally the difference between a company that's about to collapse and a multi-trillion dollar self-sustaining juggernaut) but in your world of words these two companies are the same thing.
> Claude 3 Opus
Unless they are changing the architecture in huge ways. The pre-training done for 3 goes into later models. I am sure the frontier labs are figuring out how to pretrain generic feedstocks that can be fed into downstream training pipelines. DeepSeeks incremental training run cost was what, 5M? Alibaba and DeepSeek have the best most efficient training pipelines, look at the rate at which custom Qwen models are being pumped out.
> Without continuous training there is no inference product.
This claim deserves teasing apart.
Clearly, training is a Red Queen's race today. If a model provider were to unilaterally decide to stop training, they would very quickly lose market share to competitors with better models.
On the other hand, what if market and investment conditions change such that everybody has to stop training?
In that case, the models are still there and still as useful as they were the day before. So why wouldn't there still be an inference product?