logoalt Hacker News

hsbauauvhabzbtoday at 5:44 AM1 replyview on HN

Presumably with font kerning and pixel perfect recreation of the source, it would be possible to guess the word very accurately.

The strings oioioi and oooiii will have different widths in some fonts because character organisation matters a lot.


Replies

setopttoday at 8:31 AM

I suppose it gets a bit more complex again if you enable stuff like microtype, but even then you can probably measure how much inter-letter and inter-word spacing has been adjusted by just scanning other text in the same line.

I think the conclusion is honestly that PDF is an outdated format for keeping records that might have to be redacted in the future, like court documents. Something reflowable like epub could have the text replaced with constant-space black squares instead no hints leaked as someone mentioned in a parallel comment.

show 1 reply