What's in the prompt cache?

0-_-0 • today at 12:24 PM • 2 replies • view on HN

Replies

The prompt cache caches KV Cache states based on prefixes of previous prompts and conversations. Now, for a particular coding agent conversation, it might be more involved in how caching works (with cache handles and so on), I'm talking about the general case here. This is a way to avoid repeating the same quadratic cost computing over the prompt. Typically, LLM providers have much lower pricing for reading from this cache than computing again.

Since the prompt cache is (by necessity, this is how LLMs work) prefix of a prompt, if you have repeated API calls in some service, there is a lot of savings possible by organizing queries to have less commonly varying things first, and more varying things later. For example, if you included the current date and time as the first data point in your call, then that would force a recomputation every time.

bsenftner • today at 12:26 PM

Way too much. This has got to be the most expensive and most lacking in common sense way to make software ever devised.

alt Hacker News

Replies