I've yet to see any compelling data about inference being particularly expensive. For local LLM models, that are becoming increasingly viable, it's dirt cheap. The same is also true in image gen world where now even a heavily dated GPU can cheaply and quickly produce high quality images.
I also think the image gen world is a useful analog because there are a million sites, presumably still making money, with markups that are multiple orders of magnitude off their costs. They're feeding off user ignorance that was, at least in part, artificially seeded by implying high costs for image gen back in its day. Though it's possible/probable that the initial training runs were expensive, but that's a one-and-done cost.