KV-cache is still quite small compared to the weights. It can stay in memory for reasonable context ...

zozbot234 • yesterday at 10:14 PM • 0 replies • view on HN

KV-cache is still quite small compared to the weights. It can stay in memory for reasonable context length, or be streamed to storage as a last resort. This actually doesn't impact performance too much, since we were already limited by having to stream in the much larger weights.

alt Hacker News