logoalt Hacker News

lucrbvitoday at 8:50 AM1 replyview on HN

Sounds like Multi-Head Latent Attention (MLA) from DeepSeek


Replies

veunestoday at 9:52 AM

Nah, those are completely different beasts. DeepSeek's MLA solves the KV cache issue via low-rank projection - they literally squeeze the matrix through a latent vector at train time. TurboQuant is just Post-Training Quantization where they mathematically compress existing weights and activations using polar coordinates

show 1 reply