As soon as tokens stop stop being subsidized, heavy agentic use will become as least as expensive than paying an (entry level) employee. When this happens many companies will trade off havy tolen usage for (maybe a bit slower, bit less accurate) employees again.
You're assuming the price won't come down as the tech matures. That seems like a big assumption, considering how quickly open weights models are catching up to frontier models, and how little effort has been invested so far in optimizing inference costs.
It's especially a crazy assumption to make relative to the costs of employing a human. The costs of paying an entry level employee are unlikely to go down at all, and even if those costs do decline, there's a floor they can't drop below (minimum wage at the extreme end), whereas companies are free to optimize agentic costs as close to zero as possible.
So you are assuming that a cost which is extremely susceptible to optimization but which no one has yet seriously attempted to minimize will remain perpetually above a cost which is much less susceptible to optimization, is already subject to enormous efforts to minimize, and has a legally mandated floor. That seems like a bad bet.
Maybe this just counts as “light use” since I’m a hobbyist programmer and I only run one coding agent session at a time, but I get about as much done as I did back when I was working while spending a lot of time browsing the Internet, etc.
I’ve spent $10-$20 a day using Claude to write code and closer to $5 a day now that I mostly use Deepseek and GLM, using API pricing (no subscriptions) since I don’t use Claude Code.
This is a rounding error for a company. So I think there’s plenty of room to use AI extensively while being more cost-conscious.
A significant caveat is that there is a pricing mismatch that makes it so first party's can subsidize quite heavily.
Agents are expensive in large part because tool calls require round trips. It's because these APIs are stateless and not streaming so you have to resend the whole context each time. This means you have roughly #tool calls x 1/2 context size cached input tokens over any given session. Most API providers overcharge you by a huge amount for cached tokens. A exception being Deepseek. Paying OpenAI $0.05 for 100k cached GPT5.5 tokens during a possibly 2 second round trip agent tool call is like paying $100/hr for what is likely to be ~10 to 20 GB of VRAM residence (holding the KV cache).
Or it got offloaded to NVME and you are paying $0.05 for that much PCIe bandwidth.
More straightforward to talk about the hardware directly. Full Kimi K2.6 needs an 8x H200 node to run and serve around 20 heavy users. You can rent an 8x H200 node for around $30/hr.
I'd imagine GPT-5.5 and Claude Opus 4.7 could run just fine on a 16x H200 node and serve at least 10 heavy users without the token output getting choppy.
What's funny is that this apparently wasn't something that the Uber COO seemed to think about when their company is arguably one of the most successful ever at the "subsidize to drive down costs until you capture nearly the entire market" strategy.
I think if local models catch up with current SOTA then that might not happen. Either way, I'm don't think the long-term for OAI, Anthropic etc. really holds up.
This is what I’m betting on.
The financials don’t make sense now. Based on the expenditure the finances won’t ever make sense.
I have been saying the same for while. Someone always says "but Anthropic is making money on their API" or "But it's inference will get cheaper". But I don't believe it. first all the investments have to payed off at some point and second of all there are other things that cost money. I don't believe that any of them have a positive balance sheet.
I also don't think that blitz scaling will work like with Uber. The engineers are still there. We can work without the LLM tools.
DeepSeek is an open weights model. It's possible the hosted versions are subsidized, but we know what it costs to run locally. And it's expensive, but it's also pretty clearly cheaper than an employee.
Of course, the latest DeepSeek models are not as good as Claude, but they're not super far off either.