When LLMs predict the next token, they actually produce a distribution of the probability of each of...

Scaevolus • yesterday at 9:24 PM • 0 replies • view on HN

When LLMs predict the next token, they actually produce a distribution of the probability of each of the possible next tokens, and the sampler chooses one of them, and not necessarily the most likely one!

If instead you run LLM prediction and then encode the probability of the next token of the input text you want to encode (from the cumulative distribution, a number in [0, 1]) using arithmetic coding, you can run the same operation in reverse to achieve lossless compression.

The tricky part is ensuring that your LLM executes absolutely deterministically, because you need to make sure that the encoder and decoder have the same probability distribution map at each step.

alt Hacker News