logoalt Hacker News

0-_-0today at 12:21 PM1 replyview on HN

The cache gets read at every token generated, not at every turn on the conversation.


Replies

mzltoday at 12:23 PM

Depends on which cache you mean. The KV Cache gets read on every token generated, but the prompt cache (which is what incurs the cache read cost) is read on conversation starts.

show 1 reply