do you know if they did this to it?
https://research.google/blog/turboquant-redefining-ai-effici...
Llama.cpp already uses an idea from it internally for the KV cache [0]
So a quantized KV cache now must see less degradation
[0] https://github.com/ggml-org/llama.cpp/pull/21038
Llama.cpp already uses an idea from it internally for the KV cache [0]
So a quantized KV cache now must see less degradation
[0] https://github.com/ggml-org/llama.cpp/pull/21038