Something that usually gets missed in these discussions is that the subscription quotas seem to rely...

rcr-anti • yesterday at 9:57 PM • 1 reply • view on HN

Something that usually gets missed in these discussions is that the subscription quotas seem to rely heavily on prompt caching to be economically viable, or at least less unviable. They can and do have permutations of the system prompt, tools, skills, etc. that makes the first 20k or so tokens hit the cache and not use inference resources for that portion. In addition, from my monitoring, Claude Code with Max has about an 80% cost reduction via caching (equivalent if you had done the same work with API billing), and has been improving over time. If cache use passes on a discount of 90% I think it's fair to assume the actual cost to them is close to negligible.

So they're being obtuse about it for some reason, but if you want an economically sustainable model for AI companies they have to have some kind of optimization for the otherwise ridiculously discounted subscriptions. They sell subscriptions at the same rate and quotas to enterprise now, minus the $200 tier, so this isn't just consumer marketing being subsidized by b2b revenue.

Whether they're making money or just losing less, you can only get those kind of cache optimizations when you have a fixed client.

Replies

skybrian • yesterday at 11:09 PM

Maybe they could charge API users less if they use the same prefix that Claude Code uses? More coding agents using the same prefix results in better caching, reducing costs.

alt Hacker News

Replies