logoalt Hacker News

atiedebeeyesterday at 3:49 PM4 repliesview on HN

I thought the same, so I ran brotli and zstd on some PDFs I had laying around.

  brotli 1.0.7 args: -q 11 -w 24
  zstd v1.5.0  args: --ultra -22 --long=31 
                 | Original | zstd    | brotli
  RandomBook.pdf | 15M      | 4.6M    | 4.5M
  Invoice.pdf    | 19.3K    | 16.3K   | 16.1K
I made a table because I wanted to test more files, but almost all PDFs I downloaded/had stored locally were already compressed and I couldn't quickly find a way to decompress them.

Brotli seemed to have a very slight edge over zstd, even on the larger pdf, which I did not expect.


Replies

mort96yesterday at 5:29 PM

EDIT: Something weird is going on here. When compressing zstd in parallel it produces the garbage results seen here, but when compressing on a single core, it produces result competitive with Brotli (37M). See: https://news.ycombinator.com/item?id=46723158

I did my own testing where Brotli also ended up better than ZSTD: https://news.ycombinator.com/item?id=46722044

Results by compression type across 55 PDFs:

    +------+------+-----+------+--------+
    | none | zstd | xz  | gzip | brotli |
    +------|------|-----|------|--------|
    | 47M  | 45M  | 39M | 38M  | 37M    |
    +------+------+-----+------+--------+
show 1 reply
mrspuraticyesterday at 6:16 PM

> I couldn't quickly find a way to decompress them

    pdftk in.pdf output out.pdf decompress
Thoreandanyesterday at 7:57 PM

Does your source .pdf material have FlateDecode'd chunks or did you fully uncompress it?

order-mattersyesterday at 4:47 PM

Whats the assumption we can potentially target as reason for the counter-intuitive result?

that data in pdf files are noisy and zstd should perform better on noisy files?

show 1 reply