How does caching help here? How much repetition is there in queries?

pjc50 • today at 10:05 AM • 2 replies • view on HN

Replies

Agent loops (particularly coding agents) have a huge amount of repetition, because the entire context is included in every model request. So long as it's at the start of the input and doesn't change, it will be able to hit the KV cache (assuming the model provider actually has the prefix in cache).

This only works because prompt caching is done by matching prefixes, not the entire input.

AnthonyMouse • today at 10:29 AM

It probably depends on what you're doing, but imagine you're something in the shape of a search engine. How many user queries are unique vs. the same thing someone else searched for an hour ago?

alt Hacker News

Replies