Depends on which cache you mean. The KV Cache gets read on every token generated, but the prompt cac...

mzl • today at 12:23 PM • 1 reply • view on HN

Depends on which cache you mean. The KV Cache gets read on every token generated, but the prompt cache (which is what incurs the cache read cost) is read on conversation starts.

Replies

0-_-0 • today at 12:24 PM

What's in the prompt cache?

➕ show 2 replies

alt Hacker News

Replies