"video compression" by analogy only, what this claims to actually do is delta encode the values in each token from the previous token.
Interesting idea, but the results seem almost suspicious? even accounting for the extra bits used to store the 16-bit start value for each block - ~5% for k=64
The code does funky things, like the encoder updates the reference value for each encoded token, using the non-quantized value! [1] But the decoder just ignored all that. [2] how can this work?
[1] https://github.com/cenconq25/delta-compress-llm/commit/f185f...
[2] https://github.com/cenconq25/delta-compress-llm/commit/f185f...
It looks largely LLM-generated. I'm somewhat skeptical about the results as well, hopefully someone can independently confirm that the technique works.