Nah, those are completely different beasts. DeepSeek's MLA solves the KV cache issue via low-ra...

veunes • today at 9:52 AM • 1 reply • view on HN

Nah, those are completely different beasts. DeepSeek's MLA solves the KV cache issue via low-rank projection - they literally squeeze the matrix through a latent vector at train time. TurboQuant is just Post-Training Quantization where they mathematically compress existing weights and activations using polar coordinates

Replies

esafak • today at 1:36 PM

No, it is about compressing the KV cache; see How TurboQuant works.

alt Hacker News

Replies