Speaking as someone who's bootstrapping here, I'm often envious of engineers at these larger companies, but I also worry that the incentives are screwed up.
If I were an engineer at Uber, why wouldn't I select gpt 5.5 pro @ very high thinking + fast mode for a prompt? There's no incentive not to use the most powerful (and thus most expensive) model for even the smallest of changes.
I tried one of these prompts for some tests I'm doing for image->html conversion, and a single prompt cost me $40. For someone that's paying that themselves, I'd pretty much never use this configuration. For someone at a large company where someone else is footing the bill, I'd spin these up regularly (the output was significantly better, fwiw). For engineers they're being rated on what they deliver, not the expenditure to get there.
There are ways to do this cheaply, but there are no incentives for engineers to do so.
Companies may first want to see how fast you can scale work and then trim it back down for efficiency.
image->html is a pretty involved task though. That’s basically a frontend dev’s job. $40 wouldn’t cover an hour of their time.
SWE's are expensive; median salary is $133k (not counting health insurance, payroll taxes, etc). If you can shave off an hour of dev time with $40 in LLM credits, that's $26.50 cheaper than having them do it without.
I'm not entirely convinced it works out that way so far, but that's the theory.
Trying to bring down LLM costs is sort of a double-edged sword, because the dev needs to be cutting LLM costs by more than what you're paying them. If it takes them a day to bring costs down by $1 an invocation, then it takes almost 2 years to recoup the salary costs. It's worse because LLMs currently change so much I wouldn't be confident that their solution won't be broken before the 2 year period. Will we still be tool calling in 2 years, or will that be something new? Will thinking still be a thing, or will it be superceded by something else? I don't think anyone knows, even the frontier providers.