do you know if they did this to it?

kennethops • yesterday at 2:11 PM • 1 reply • view on HN

kgeist • yesterday at 2:28 PM

Llama.cpp already uses an idea from it internally for the KV cache [0]

So a quantized KV cache now must see less degradation

alt Hacker News