logoalt Hacker News

sreantoday at 7:30 AM2 repliesview on HN

I do not know inner details of Zstandard, but I would expect that it to least do suffix/prefix stats or word fragment stats, not just words and phrases.


Replies

Jaxantoday at 10:59 AM

The thing is that two English texts on completely different topics will compress better than say and English and Spanish text on exactly the same topic. So compression really only looks at the form/shape of text and not meaning.

show 1 reply
duskwufftoday at 9:01 AM

It's not specifically aware of the syntax - it'll match any repeated substrings. That just happens to usually end up meaning words and phrases in English text.