Concur. Zstandard is a good compressor, but it's not magical; comparing the compressed size of ...

duskwuff • today at 2:03 AM • 2 replies • view on HN

Concur. Zstandard is a good compressor, but it's not magical; comparing the compressed size of Zstd(A+B) to the common size of Zstd(A) + Zstd(B) is effectively just a complicated way of measuring how many words and phrases the two documents have in common. Which isn't entirely ineffective at judging whether they're about the same topic, but it's an unnecessarily complex and easily confused way of doing so.

Replies

srean • today at 7:30 AM

I do not know inner details of Zstandard, but I would expect that it to least do suffix/prefix stats or word fragment stats, not just words and phrases.

➕ show 2 replies

D-Machine • today at 6:01 AM

Yup. Data compression ≠ semantic compression.

alt Hacker News

Replies