I thought the same, so I ran brotli and zstd on some PDFs I had laying around.
brotli 1.0.7 args: -q 11 -w 24
zstd v1.5.0 args: --ultra -22 --long=31
| Original | zstd | brotli
RandomBook.pdf | 15M | 4.6M | 4.5M
Invoice.pdf | 19.3K | 16.3K | 16.1K
I made a table because I wanted to test more files, but almost all PDFs I downloaded/had stored locally were already compressed and I couldn't quickly find a way to decompress them.Brotli seemed to have a very slight edge over zstd, even on the larger pdf, which I did not expect.
> I couldn't quickly find a way to decompress them
pdftk in.pdf output out.pdf decompressDoes your source .pdf material have FlateDecode'd chunks or did you fully uncompress it?
Whats the assumption we can potentially target as reason for the counter-intuitive result?
that data in pdf files are noisy and zstd should perform better on noisy files?
EDIT: Something weird is going on here. When compressing zstd in parallel it produces the garbage results seen here, but when compressing on a single core, it produces result competitive with Brotli (37M). See: https://news.ycombinator.com/item?id=46723158
I did my own testing where Brotli also ended up better than ZSTD: https://news.ycombinator.com/item?id=46722044
Results by compression type across 55 PDFs: