"compressed size" does not seem to include the size of the model and the code to run it. According to the rules of Large Text Compression Benchmark, total size of those must be counted, otherwise a 0-byte "compressed" file with a decompressor containing the plaintext would win.
True for competitions, but if your compression algorithm is general purpose then this matters less (within reason - no one wants to lug around a 1TB compression program).
Yeah, but the xz algorithm is also not counted in the bytes... Here the "program" is the LLM, much like your brain remembers things by coding them compressed and then reconstructs them. It is a different type of compression: compression by "understanding", which requires the whole corpus of possible inputs in some representation. The comparison is not fair to classical algorithms yet that's how you can compress a lot more (given a particular language): by having a model of it.
Technically correct, but a better benchmark would be a known compressor with an unknown set of inputs (that come from a real-world population, e.g. coherent English text).