Caching is pretty simple. If it's a prefix match, it's cacheable. Very long context window...

reissbaker • yesterday at 11:48 PM • 0 replies • view on HN

Caching is pretty simple. If it's a prefix match, it's cacheable. Very long context windows will be much more expensive than shorter ones, even with caching, assuming you're using Claude Code or some similar harness for both. You'll get caching in both, but you'll pay more for the longer context. The cost of occasional compaction is more or less negligible compared to the massive cost of the input tokens that are getting charged repeatedly for every single request.

If you have 500k context, three turns will burn ~1.5MM tokens. If you have 250k context, three turns will burn ~750k tokens. If you have 125k context, three turns will burn ~375k tokens. Claude can at most generate 32k output tokens per turn in Claude Code (and it rarely does so), so despite the higher price of output tokens, almost all costs are dominated by input token costs. Even at cached input prices, cost scales near-linearly with context length: if you 2x your context length, you'll roughly ~2x your cost.

Now, it might be the case that longer context windows allow Claude to complete the task better — although I'd be surprised if there were many tasks requiring >200k tokens just to get the job done (that's nearly ten full copies of Shakespeare's "A Midsummer Night's Dream"). And they're definitely convenient, in the sense that you don't need to think about context management as much and worry about a sudden, unexpected autocompact wrecking things if you aren't carefully manually compacting at logical points. But they're definitely more expensive on a near-linear basis and you're paying for that convenience.

alt Hacker News