Llama.cpp already uses an idea from it internally for the KV cache [0] So a quantized KV cache now... | alt Hacker News

alt Hacker News

kgeist • yesterday at 2:28 PM • 0 replies • view on HN

Llama.cpp already uses an idea from it internally for the KV cache [0]

So a quantized KV cache now must see less degradation

[0] https://github.com/ggml-org/llama.cpp/pull/21038