KV-cache is still quite small compared to the weights. It can stay in memory for reasonable context length, or be streamed to storage as a last resort. This actually doesn't impact performance too much, since we were already limited by having to stream in the much larger weights.