Right, but that's still external to the LLM, it's just a KV cache that's stored on th...

CommieBobDole • today at 2:19 AM • 0 replies • view on HN

Right, but that's still external to the LLM, it's just a KV cache that's stored on the provider side for performance reasons, so that the client doesn't have to re-send the whole chat history with every subsequent call in the conversation.

It still generates every response using the model's pristine state with every new API call; whether the context is provided from the client or from a colocated cache server doesn't really change that.

alt Hacker News