The costs of Netflix and Spotify are licensing. Offering the subscription at half price to additional users is non-cannibalizing and a way to get more revenue from the same content.
The cost of LLMs are the infrastructure. Unless someone can buy/power/run compute cheaper (Google w/ TPUs, locales with cheap electricity, etc), there won't be a meaningful difference in costs.
That assumes inference efficiency is static, which isn't really the case. Between aggressive quantization, speculative decoding, and better batching strategies, the cost per token can vary wildly on the exact same hardware. I suspect the margins right now come from architecture choices as much as raw power costs.