This is an important inflection point.
When tokens get correctly priced, all of the insane over-investment in capital will need to draw back: buying data centers, semiconductors, and politicians.
Even then, it won't be right-priced with regard to actual costs. The environmental impact should have been priced in from the beginning. There seems to be a parallel with subsidizing fossil fuels, under pricing them which encourages over dependence, ignoring the real costs society will pay later.
I've yet to see any compelling data about inference being particularly expensive. For local LLM models, that are becoming increasingly viable, it's dirt cheap. The same is also true in image gen world where now even a heavily dated GPU can cheaply and quickly produce high quality images.
I also think the image gen world is a useful analog because there are a million sites, presumably still making money, with markups that are multiple orders of magnitude off their costs. They're feeding off user ignorance that was, at least in part, artificially seeded by implying high costs for image gen back in its day. Though it's possible/probable that the initial training runs were expensive, but that's a one-and-done cost.
It rather looks like chatgpt/antrophic enterprise tokens and API calls are too expensive. Competition is quite strong on openrouter.
However, the real problem is running wild with token burning. With parallel agents calling subagents you can burn lots of tokens per minute. Especially with thousands of engineers.