logoalt Hacker News

aragonitetoday at 2:05 AM2 repliesview on HN

Do long sessions also burn through token budgets much faster?

If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?


Replies

datherytoday at 2:16 AM

That's correct. Input caching helps, but even then at e.g. 800k tokens with all of them cached, the API price is $0.50 * 0.8 = $0.40 per request, which adds up really fast. A "request" can be e.g. a single tool call response, so you can easily end up making many $0.40 requests per minute.

show 1 reply
jasondclintontoday at 2:16 AM

If you use context cacheing, it saves quite a lot on the costs/budgets. You can cache 900k tokens if you want.