The prompt cache caches KV Cache states based on prefixes of previous prompts and conversations. Now...

mzl • yesterday at 1:59 PM • 1 reply • view on HN

The prompt cache caches KV Cache states based on prefixes of previous prompts and conversations. Now, for a particular coding agent conversation, it might be more involved in how caching works (with cache handles and so on), I'm talking about the general case here. This is a way to avoid repeating the same quadratic cost computing over the prompt. Typically, LLM providers have much lower pricing for reading from this cache than computing again.

Since the prompt cache is (by necessity, this is how LLMs work) prefix of a prompt, if you have repeated API calls in some service, there is a lot of savings possible by organizing queries to have less commonly varying things first, and more varying things later. For example, if you included the current date and time as the first data point in your call, then that would force a recomputation every time.

Replies

lostmsu • yesterday at 6:09 PM

> The prompt cache caches KV Cache states

Yes. The cache that caches KV cache states is called the KV cache. "Prompt cache" is just index from string prefixes into KV cache. It's tiny and has no computational impact. The parent was correct to question you.

The cost of using it comes from the blend of the fact that you need more compute to calculate later tokens and the fact that you have to keep KV cache entries between requests of the same user somewhere while the system processes requests of other users.

➕ show 1 reply

alt Hacker News

Replies