Compression can be generalized as probability modelling (prediction) + entropy coding of the difference between the prediction and the data. The entropy coding has known optimal solutions.
So yes, LLMs are nearly ideal text compressors, except for all the practical inconveniences of their size and speed (they can be reliably deterministic if you sacrifice parallel execution and some optimizations).