Yes of course, I don't think anyone will disagree with that. My comment had nothing to do with meaning but was about the mechanics of compression.
That said, lexical and syntactic patterns are often enough for classification and clustering in a scenario where the meaning-to-lexicons mapping is fixed.
The reason compression based classifiers trail a little behind classifiers built from first principles, even in this fixed mapping case, is a little subtle.
Optimal compression requires correct probability estimation. Correct probability estimation will yield optimal classifier. In other words, optimal compressors, equivalently correct probability estimators are sufficient.
They are however not necessary. One can obtain the theoretical best classifier without estimating the probabilities correctly.
So in the context of classification, compressors are solving a task that is much much harder than necessary.