The paper is about vector quantization, which affects KV cache not model weights/sizes.

rohansood15 • today at 4:38 PM • 0 replies • view on HN

alt Hacker News