logoalt Hacker News

maxlohyesterday at 5:17 PM3 repliesview on HN

I don't find it really viable. There are so many ways to express the same question, and context does matter: the same prompt becomes irrelevant if the previous prompts or LLM responses differ.

With the cache limited to the same organization, the chances of it actually being reused would be extremely low.


Replies

qeternityyesterday at 9:38 PM

In a chat setting you hit the cache every time you add a new prompt: all historical question/answer pairs are part of the context and don’t need to be prefilled again.

On the API side imagine you are doing document processing and have a 50k token instruction prompt that you reuse for every document.

It’s extremely viable and used all the time.

show 1 reply
IanCalyesterday at 6:04 PM

It gets used massively in a conversation, also anything that has a lot of explain actions in the system prompt means you have a large matching prefix.

babelfishyesterday at 5:50 PM

Think of it as a very useful prefix match. If all of your threads start with the same system prompt, you will reap benefits from prompt caching.