logoalt Hacker News

macleginntoday at 11:14 AM2 repliesview on HN

"TurboQuant proved it can quantize the key-value cache to just 3 bits without requiring training or fine-tuning and causing any compromise in model accuracy" -- what do each 3 bits correspond to? Hardly individual keys or values, since it would limit each of them to 8 different vectors.


Replies

carlosvegatoday at 12:40 PM

Is the number of bits per coordinate. So, 1 bit is 2x2 grid. 3 bit is a 64 cell grid (2^3 x 2^3). Here you have a demo.

https://mesuvash.github.io/blog/2026/turboquant-interactive/

jbellistoday at 11:42 AM

The explanation is terrible, but it's clear that it's not actually lossless.