logoalt Hacker News

mvdtnztoday at 1:56 AM2 repliesview on HN

Why do you mean by this? What cache?


Replies

mirashiitoday at 2:01 AM

Generally speaking, there's prompt caching that can be enabled in the API with things like this: https://platform.claude.com/docs/en/build-with-claude/prompt...

For a specific harness, they've all found ways to optimize to get higher cache hit rates with their harness. Common system prompts and all, and more and more users hitting cache really makes the cost of inference go down dramatically.

What bothers me about a lot of the discussion about providers disallowing other harnesses with the subscription plans around here is the complete lack of awareness of how economies of scale from common caching practices across more users can enable the higher, cheaper quotas subscriptions give you.

show 1 reply
nikcubtoday at 2:01 AM

prompt caching - big part of the reason why they can economically offer claude code plans. one of the ant team explain it here:

https://x.com/trq212/status/2024574133011673516