KVarN: Native vLLM backend for KV-cache quantization by Huawei

51 points • by theanonymousone • today at 3:18 PM • 7 comments • view on HN

throwa356262 • today at 3:54 PM

Better performance than TQ and better quality than FP16?

Am I reading this right??

➕ show 3 replies

v3ss0n • today at 3:53 PM

Why this is not a PR for vLLM ?

➕ show 2 replies

shockembopper • today at 5:17 PM

[dead]

alt Hacker News