logoalt Hacker News

bhoustontoday at 1:45 PM2 repliesview on HN

Are they using a custom dictionary with Brotli designed for PDFs? I am not sure if it would help or not, but it seems like one of those cases it may help?

Something like this:

https://developer.chrome.com/blog/shared-dictionary-compress...

In my applications, in the area of 3D, I've been moving away from Brotli because it is just so slow for large files. I prefer zstd, because it is like 10x faster for both compression and decompression.


Replies

whizzxtoday at 2:48 PM

The pdf association is still running experiments on whether or not to support custom dictionaries based on real life workloads gains.

So it might land in the spec once it has proven if offers enough value

Proclustoday at 6:12 PM

It seems they're using the standard dictionary, which is utterly bizzare.

The standard Brotli dictionary bakes in a ton of assumptions about what the Web looked like in 2015, including not just which HTML tags were particularly common but also such things as which swear words were trendy.

It doesn't seem reasonable to think that PDFs have symbol probabilities remotely similar to the web corpus Google used to come up with that dictionary.

On top of that, it seems utterly daft to be baking that into a format which is expected to fit archival use cases and thus impose that 2015 dictionary on PDF readers for a century to come.

I too would strongly prefer that they use zstd.

show 1 reply