logoalt Hacker News

x_maytoday at 11:48 AM0 repliesview on HN

KV cache compression, so how much memory the model needs to use for extending its context. Does not affect the weight size.