Better keep the KV cache in full precision
Wow.. the GOAT himself.. thank you sooo much for creating llama.cpp ... will re-deploy with full kv cache once requests stop coming.
Wow.. the GOAT himself.. thank you sooo much for creating llama.cpp ... will re-deploy with full kv cache once requests stop coming.