logoalt Hacker News

pjc50today at 10:05 AM2 repliesview on HN

How does caching help here? How much repetition is there in queries?


Replies

jcparkyntoday at 11:11 AM

Agent loops (particularly coding agents) have a huge amount of repetition, because the entire context is included in every model request. So long as it's at the start of the input and doesn't change, it will be able to hit the KV cache (assuming the model provider actually has the prefix in cache).

This only works because prompt caching is done by matching prefixes, not the entire input.

AnthonyMousetoday at 10:29 AM

It probably depends on what you're doing, but imagine you're something in the shape of a search engine. How many user queries are unique vs. the same thing someone else searched for an hour ago?