logoalt Hacker News

erutoday at 8:39 AM1 replyview on HN

No, it's not for lossy compression only.

An LLM can give you a probability distribution for the next token. You can pair that with arithmetic coding to get a lossless compression/decompression algorithm. See https://en.wikipedia.org/wiki/Arithmetic_coding


Replies

adrian_btoday at 9:36 AM

In the way that you say, you can do lossless data compression, but then the LLM is used in a very distinct way than it is used in applications like chat or coding assistance.

In the latter applications, you do queries which aim to extract information from the training data set, but which may return hallucinated content instead of correct content.

If you use an LLM just to provide an estimation for the frequencies of tokens in an input data stream, and then you use the estimated frequencies to encode the input data, then you do not care about which were the tokens predicted by the LLM, because they are not used. The worst effect of any wrong predictions by the LLM is a slightly worse data compression ratio than the optimum.

When it is said that LLMs do a lossy data compression, that refers to the compression from the training data set to sequences of output tokens.